Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specs,schedulers: add Role.mounts+BindMount and implement in docker,kubernetes,aws_batch schedulers #420

Closed
wants to merge 1 commit into from

Conversation

d4l3k
Copy link
Member

@d4l3k d4l3k commented Mar 11, 2022

This adds a new Role option called mounts that allows for specifying different types of mounts on the worker. Currently the only supported mount is BindMount which does a simple bind from host to container. For more complex binds in the future we can easily extend mounts: List[BindMount] to mounts: List[Union[BindMount, PersistentVolumeClaimMount]] if necessary

This also adds in a new argument to dist.ddp

$ torchx run --scheduler local_docker --wait --log dist.ddp --script dist_app.py -j 1x1 --mounts type=bind,src=/tmp,dst=/tmp,readonly

To help with these parsing it there's a new parse_mounts method in specs. The format for these is:

type=bind,src=<src>,dst=<dst>[,readonly][,type=bind,...]

This follows the docker CLI format https://docs.docker.com/storage/bind-mounts/

Fixes #415

References

Test plan:

CI

Still need to add some more unit tests per scheduler (maybe component test?)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2022
@facebook-github-bot
Copy link
Contributor

@d4l3k has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@d4l3k d4l3k requested review from kiukchung and aivanou March 11, 2022 23:09
@codecov
Copy link

codecov bot commented Mar 11, 2022

Codecov Report

Merging #420 (5c982ab) into main (d1669d7) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #420      +/-   ##
==========================================
+ Coverage   94.23%   94.29%   +0.06%     
==========================================
  Files          67       67              
  Lines        3763     3805      +42     
==========================================
+ Hits         3546     3588      +42     
  Misses        217      217              
Impacted Files Coverage Δ
torchx/schedulers/local_scheduler.py 92.95% <ø> (ø)
torchx/schedulers/ray_scheduler.py 93.93% <ø> (ø)
torchx/schedulers/slurm_scheduler.py 97.46% <ø> (ø)
torchx/specs/__init__.py 96.15% <ø> (ø)
torchx/components/dist.py 79.54% <100.00%> (ø)
torchx/schedulers/aws_batch_scheduler.py 83.33% <100.00%> (+0.53%) ⬆️
torchx/schedulers/docker_scheduler.py 96.19% <100.00%> (+0.06%) ⬆️
torchx/schedulers/kubernetes_scheduler.py 93.00% <100.00%> (+0.21%) ⬆️
torchx/specs/api.py 99.00% <100.00%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1669d7...5c982ab. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

@d4l3k has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@d4l3k has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@d4l3k has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(torchx/scheduler) support mounts for docker_scheduler
2 participants