Managing large binary files with Git

后端 未结 12 854
不思量自难忘°
不思量自难忘° 2020-11-22 05:47

I am looking for opinions of how to handle large binary files on which my source code (web application) is dependent. We are currently discussing several alternatives:

12条回答
  •  [愿得一人]
    2020-11-22 06:23

    I would use submodules (as Pat Notz) or two distinct repositories. If you modify your binary files too often, then I would try to minimize the impact of the huge repository cleaning the history:

    I had a very similar problem several months ago: ~21 GB of MP3 files, unclassified (bad names, bad id3's, don't know if I like that MP3 file or not...), and replicated on three computers.

    I used an external hard disk drive with the main Git repository, and I cloned it into each computer. Then, I started to classify them in the habitual way (pushing, pulling, merging... deleting and renaming many times).

    At the end, I had only ~6 GB of MP3 files and ~83 GB in the .git directory. I used git-write-tree and git-commit-tree to create a new commit, without commit ancestors, and started a new branch pointing to that commit. The "git log" for that branch only showed one commit.

    Then, I deleted the old branch, kept only the new branch, deleted the ref-logs, and run "git prune": after that, my .git folders weighted only ~6 GB...

    You could "purge" the huge repository from time to time in the same way: Your "git clone"'s will be faster.

提交回复
热议问题