I am looking for opinions of how to handle large binary files on which my source code (web application) is dependent. We are currently discussing several alternatives:
I would use submodules (as Pat Notz) or two distinct repositories. If you modify your binary files too often, then I would try to minimize the impact of the huge repository cleaning the history:
I had a very similar problem several months ago: ~21 GB of MP3 files, unclassified (bad names, bad id3's, don't know if I like that MP3 file or not...), and replicated on three computers.
I used an external hard disk drive with the main Git repository, and I cloned it into each computer. Then, I started to classify them in the habitual way (pushing, pulling, merging... deleting and renaming many times).
At the end, I had only ~6 GB of MP3 files and ~83 GB in the .git directory. I used git-write-tree
and git-commit-tree
to create a new commit, without commit ancestors, and started a new branch pointing to that commit. The "git log" for that branch only showed one commit.
Then, I deleted the old branch, kept only the new branch, deleted the ref-logs, and run "git prune": after that, my .git folders weighted only ~6 GB...
You could "purge" the huge repository from time to time in the same way: Your "git clone"'s will be faster.