Training SWaV with decentralized averaging

This code trains SwAV model on ImageNet using collaborative SGD. In our code we use vissl and ClassyVision with some modifications.

Requirements (for all participants):

Install the library (vissl) from the root folder using the guide from source.
Install the library (ClassyVision) from the root folder.
Install hivemind (see main README).

How to run

Get ImageNet by following the vissl guide.
Run the first DHT peer (aka "coordinator") on a node that is accessible to all trainers: python run_initial_dht_node.py --listen_on [::]:1337. After that, you can get INITIAL_DHT_ADDRESS and INITIAL_DHT_PORT from the stdout.
For all GPU trainers, run

python vissl/tools/run_distributed_engines.py \
    hydra.verbose=true config=pretrain/swav/swav_1node_resnet_submit \
    config.CHECKPOINT.CHECKPOINT_ITER_FREQUENCY=30000 \
    +config.OPTIMIZER.batch_size_for_tracking=64 \
    config.DATA.TRAIN.BATCHSIZE_PER_REPLICA=64  \
    +config.OPTIMIZER.lr=2.4 +config.OPTIMIZER.warmup_start_lr=0.3 \
    +config.OPTIMIZER.warmup_epochs=500 +config.OPTIMIZER.max_epochs=5000 \
    +config.OPTIMIZER.eta_min=0.0048 \
    +config.OPTIMIZER.exp_prefix="test_resnet50_swav_collaborative_experiment" \
    +config.OPTIMIZER.target_group_size=4 \
    +config.OPTIMIZER.max_allowed_epoch_difference=1 \
    +config.OPTIMIZER.total_steps_in_epoch=640 config.LOSS.swav_loss.queue.start_iter=98000 \
    +config.OPTIMIZER.report_progress_expiration=600 +config.DATA.TRAIN.DATA_PATHS=["$(IMAGENET_PATH)/train"] \
    config.OPTIMIZER.dht_listen_on_port=1124 config.OPTIMIZER.averager_listen_on_port=1125 \
    +config.OPTIMIZER.dht_initial_peers=["$(INITIAL_DHT_ADDRESS):$(INITIAL_DHT_PORT)"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Training SWaV with decentralized averaging

Requirements (for all participants):

How to run

Files

README.md

Latest commit

History

README.md

File metadata and controls

Training SWaV with decentralized averaging

Requirements (for all participants):

How to run