Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large files present in git repository #29

Closed
ctb opened this issue Aug 29, 2017 · 3 comments
Closed

large files present in git repository #29

ctb opened this issue Aug 29, 2017 · 3 comments

Comments

@ctb
Copy link
Contributor

ctb commented Aug 29, 2017

Expected behavior

A 'git clone' should be fast and lightweight, pulling down only human-edited/created files.

Actual behavior

There are several big directories that take a while to download --

10484   workflows/assembly
50544   examples/data
719644  workflows/functional_inference

The 700 MB directory should probably be moved over to OSF; more generally, any large/automatically generated data should be over on OSF.

Steps to reproduce the behavior

@olungu
Copy link
Collaborator

olungu commented Sep 15, 2017

We had some trouble with the large data files as well, as they caused our virtual machines to run out of space very quickly. There are also some large files in:
workflows/read_filtering and it seems that none of these are used in the actual workflow readme (data sets actually downloaded from osf).
In examples/data... are the mg_*.fna.gz needed to run the example notebook? If not they can be deleted.

@charlesreid1
Copy link
Member

charlesreid1 commented Feb 8, 2018

Note that once large files are added to a repository history, they are stuck there. Each time you make a commit in git, it hashes the changes you made, together with all prior commits, so removing a commit from the history affects every hash of every commit that came after it. You cannot* purge a commit from the repository history, and git will always need to download the entire repo history when running a git clone.

* = it is technically possible, but it is considered the version control equivalent of tampering with evidence.

@charlesreid1
Copy link
Member

See #53 for continued discussion on topic...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants