-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing cloning time #65
Comments
Last few items are related to #53 |
A couple of thoughts on this problem:
* [git lfs](https://git-lfs.github.com/) is intended to circumvent this problem (although the damage is done once a large file is added to the repo). If we have future large data files to add to the repo we should use git lfs to store a pointer to its location in the cloud.
* If we are okay with rewriting some of the history of the repo, we can remove the large data files from the history of the repo (would rewrite commit histories) using a tool like [git-forget-blob](https://gist.github.com/nachoparker/c93a8675ba9a93bc5f422b060561a169). This would erase the data files and any blobs that referred to that file, saving us some space.
* If modifying history is forbidden, we could hold off doing anything about the inflated repo size before the release of dahak version 1.0, but create a new repo for dahak 2.0 (or, some variation on this idea).
I personally like the rewriting-history approach here.
As far as locations where these large files might be migrated:
* data repo on Github
* AWS/GC bucket
* OSF file storage
It seems like the first option would be best, since these files were successfully added to a Github repo at some point.
My one experience with git-lfs was that it was a real mess; dunno if things
have improved since then.
OSF is my preferred approach because (a) it's free (b) it's versioned.
Contra opinions just fine :). But I don't want to use AWS or GC because
we don't have extended funding for this. I think GitHub charges $$ over
a certain amount, too; could we look into that?
best,
--titus
--
C. Titus Brown, [email protected]
|
Okay, we can just keep using OSF for hosting until they cry uncle or
something. :D
I'll give git-lfs a try on some test repoos and see if I can't figure out a
smooth setup.
Some info on Github pricing via
https://help.github.com/articles/about-storage-and-bandwidth-usage/:
All personal and organization accounts using Git LFS receive 1 GB of free
storage and 1 GB a month of free bandwidth.
"One data pack costs $5 per month, and provides a monthly quota of 50 GB
for bandwidth and 50 GB for storage. You can purchase as many data packs as
you need. For example, if you need 150 GB of storage, you'd buy three data
packs."
That's $100/TB for traffic and $100/TB for storage.
For comparison, AWS traffic is $90/TB, S3 (blob storage) is $20/TB, and
elastic block storage (file system) is $120/TB.
(So... nothing to write home about.)
Charles
|
I'll give git-lfs a try on some test repoos and see if I can't figure out
a smooth setup.
Same for git-forget-blob. It will be important to work out the correct
steps. We don't want to jettison the escape pod before we've gotten into
the escape pod.
Charles
|
This is ready to go. Steps for performing a git-commit-ectomy: https://github.com/charlesreid1/git-commit-ectomy There are a few key things to be aware of when using git-forget-blob, namely, (a) it requires GNU sed so it won't work out-of-the-box on Mac, and (b) you have to |
Looks scary. FWIW, you can |
This change should only impact contributors to the repo. Once the commits have been removed, contributors will need to clone a new copy of the repo. (Otherwise they might accidentally re-add the removed commits.) |
git-commit-ectomy: complete. ⚡️⚡️⚡️ 💯 ⚡️⚡️⚡️ |
Cloning the repo is taking a lot of time (5+ minutes).
Here's a list of the 40 largest objects in the github repo (in order of increasing size):
A couple of thoughts on this problem:
As far as locations where these large files might be migrated:
It seems like the first option would be best, since these files were successfully added to a Github repo at some point.
The text was updated successfully, but these errors were encountered: