Skip to content

Latest commit

 

History

History
32 lines (28 loc) · 2.01 KB

README.md

File metadata and controls

32 lines (28 loc) · 2.01 KB

Training SWaV with decentralized averaging

This code trains SwAV model on ImageNet using collaborative SGD. In our code we use vissl and ClassyVision with some modifications.

Requirements (for all participants):

  • Install the library (vissl) from the root folder using the guide from source.
  • Install the library (ClassyVision) from the root folder.
  • Install hivemind (see main README).

How to run

  1. Get ImageNet by following the vissl guide.
  2. Run the first DHT peer (aka "coordinator") on a node that is accessible to all trainers: python run_initial_dht_node.py --listen_on [::]:1337. After that, you can get INITIAL_DHT_ADDRESS and INITIAL_DHT_PORT from the stdout.
  3. For all GPU trainers, run
python vissl/tools/run_distributed_engines.py \
    hydra.verbose=true config=pretrain/swav/swav_1node_resnet_submit \
    config.CHECKPOINT.CHECKPOINT_ITER_FREQUENCY=30000 \
    +config.OPTIMIZER.batch_size_for_tracking=64 \
    config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64  \
    +config.OPTIMIZER.lr=2.4 +config.OPTIMIZER.warmup_start_lr=0.3 \
    +config.OPTIMIZER.warmup_epochs=500 +config.OPTIMIZER.max_epochs=5000 \
    +config.OPTIMIZER.eta_min=0.0048 \
    +config.OPTIMIZER.exp_prefix="test_resnet50_swav_collaborative_experiment" \
    +config.OPTIMIZER.target_group_size=4 \
    +config.OPTIMIZER.max_allowed_epoch_difference=1 \
    +config.OPTIMIZER.total_steps_in_epoch=640 config.LOSS.swav_loss.queue.start_iter=98000 \
    +config.OPTIMIZER.report_progress_expiration=600 +config.DATA.TRAIN.DATA_PATHS=["$(IMAGENET_PATH)/train"] \
    config.OPTIMIZER.dht_listen_on_port=1124 config.OPTIMIZER.averager_listen_on_port=1125 \
    +config.OPTIMIZER.dht_initial_peers=["$(INITIAL_DHT_ADDRESS):$(INITIAL_DHT_PORT)"]