Git repo still huge after large files removed from repository history

前端 未结 4 470
后悔当初
后悔当初 2021-01-12 19:21

I have a codebase that (until now) used git to store its dependencies. The repository itself is available here (warning: it\'s HUGE). Needless to say, I need to remove the d

4条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-12 19:36

    Use --prune=now on git gc

    Although you'd successfully written your unwanted objects out of history, it looks like those unwanted objects were not being pruned because they were too young to be pruned by default (see the configuration docs on git gc for a bit more detail). Using git gc --prune=now should handle that, or you could see this answer for a more nuclear option.

    Although that should fix your final problem, an underlying problem was the difficulty of finding big blobs in order to remove them using git filter-branch - to which I would say:

    ...don't use git filter-branch

    git filter-branch is painful to use for a task like this, and there's a much better, less well-known tool called The BFG, specifically designed for removing Large Files from Git repos.

    The core command to remove big files looks just like this:

    $ bfg  --strip-blobs-bigger-than 10MB  my-repo.git
    

    Any blob over 10MB in size (that isn't in your latest commit) will be totally removed from your repository's history - you don't have to manually find the files yourself, and files in protected commits are safe.

    You can then use git gc to clean away the dead data:

    $ git gc --prune=now --aggressive
    

    The BFG is typically hundreds of times faster than running git-filter-branch on a big repo and the options are tailored around these two common use-cases:

    • Removing Crazy Big Files
    • Removing Passwords, Credentials & other Private data

    Full disclosure: I'm the author of the BFG Repo-Cleaner.

提交回复
热议问题