Git repo still huge after large files removed from repository history

前端 未结 4 479
后悔当初
后悔当初 2021-01-12 19:21

I have a codebase that (until now) used git to store its dependencies. The repository itself is available here (warning: it\'s HUGE). Needless to say, I need to remove the d

相关标签:
4条回答
  • 2021-01-12 19:31

    Have you tried running git gc? http://www.kernel.org/pub/software/scm/git/docs/git-gc.html

    0 讨论(0)
  • 2021-01-12 19:36

    Use --prune=now on git gc

    Although you'd successfully written your unwanted objects out of history, it looks like those unwanted objects were not being pruned because they were too young to be pruned by default (see the configuration docs on git gc for a bit more detail). Using git gc --prune=now should handle that, or you could see this answer for a more nuclear option.

    Although that should fix your final problem, an underlying problem was the difficulty of finding big blobs in order to remove them using git filter-branch - to which I would say:

    ...don't use git filter-branch

    git filter-branch is painful to use for a task like this, and there's a much better, less well-known tool called The BFG, specifically designed for removing Large Files from Git repos.

    The core command to remove big files looks just like this:

    $ bfg  --strip-blobs-bigger-than 10MB  my-repo.git
    

    Any blob over 10MB in size (that isn't in your latest commit) will be totally removed from your repository's history - you don't have to manually find the files yourself, and files in protected commits are safe.

    You can then use git gc to clean away the dead data:

    $ git gc --prune=now --aggressive
    

    The BFG is typically hundreds of times faster than running git-filter-branch on a big repo and the options are tailored around these two common use-cases:

    • Removing Crazy Big Files
    • Removing Passwords, Credentials & other Private data

    Full disclosure: I'm the author of the BFG Repo-Cleaner.

    0 讨论(0)
  • 2021-01-12 19:42

    I had accidentally stored large .jpa backups of my site in git -

    git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY_BIG_DIRECTORY_OR_FILE' --tag-name-filter cat -- --all

    Relpace MY_BIG_DIRECTORY_OR_FILE with the folder in question to completely rewrite your history, including tags.

    source:

    http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history

    0 讨论(0)
  • 2021-01-12 19:44

    You need to run David Underhill's script on each branch in the repository to ensure the references are removed from all branches.

    Then, as in the further discussion, initialize a new repository with git init and either git pull from the original or git remote add origin <original> and then pull all branches.

    $ du -sh ./BIG
    299M ./BIG
    $ cd BIG
    $ git checkout master
    $ git-remove-history REMOVE_ME
    ....
    $ git checkout branch2
    $ git-remove-history REMOVE_ME
    ...
    $ cd ../SMALL
    $ git init
    $ git remote add origin ../BIG
    $ git fetch --all
    $ git checkout master
    $ cd ..
    $ du -sh ./SMALL ./BIG
    26M ./SMALL
    244M ./BIG
    
    0 讨论(0)
提交回复
热议问题