-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] s5cmd command support for COPY Persistent Data Storage #2176
Comments
Thanks for the pointer @turian! We'll look into using s5cmd and integrate it if it improves perf without breaking user workflows. |
I will go ahead and benchmark for a comparison and see if it has all the necessary functionality for Skypilot as well. Thanks @turian ! |
@landscapepainter @romilbhardwaj Awesome! It's a nifty little tool, because usually the most painful step when you spin up a machine is getting the data onto disk from blob storage. Total dream come true. I'm crossing my fingers it's an easy integration from the skypilot team. |
Confirmed that Ran a quick benchmark and it is impressively quick compared to aws-cli:
6 of 10GB files from local to s3 sync on a medium tier disk:
We may need to do more checks before we actually implement it, but this seems promising! @romilbhardwaj @turian |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been stalled for 10 days with no activity. |
#2291 is fixing this |
Amazing! Looking forward to seeing it land |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been stalled for 10 days with no activity. |
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This issue was closed because it has been stalled for 10 days with no activity. |
Persistent Data Storage that is COPYéd can be quite slow to copy.
s5cmd is MUCH faster at uploading and downloading from S3 / R2 that s3cmd and aws-cli. Like MUCH faster: "For uploads, s5cmd is 32x faster than s3cmd and 12x faster than aws-cli. For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s), whereas s3cmd and aws-cli can only reach 85 MB/s and 375 MB/s respectively."
NOTE: s5cmd is only fast when it can parallelize into many downloads. It is of comparable speed if you are just downloading one big file.
The text was updated successfully, but these errors were encountered: