I have a codebase that (until now) used git to store its dependencies. The repository itself is available here (warning: it\'s HUGE). Needless to say, I need to remove the d
Have you tried running git gc
? http://www.kernel.org/pub/software/scm/git/docs/git-gc.html
--prune=now
on git gcAlthough you'd successfully written your unwanted objects out of history, it looks like those unwanted objects were not being pruned because they were too young to be pruned by default (see the configuration docs on git gc
for a bit more detail). Using git gc --prune=now
should handle that, or you could see this answer for a more nuclear option.
Although that should fix your final problem, an underlying problem was the difficulty of finding big blobs in order to remove them using git filter-branch
- to which I would say:
git filter-branch
is painful to use for a task like this, and there's a much better, less well-known tool called The BFG, specifically designed for removing Large Files from Git repos.
The core command to remove big files looks just like this:
$ bfg --strip-blobs-bigger-than 10MB my-repo.git
Any blob over 10MB in size (that isn't in your latest commit) will be totally removed from your repository's history - you don't have to manually find the files yourself, and files in protected commits are safe.
You can then use git gc
to clean away the dead data:
$ git gc --prune=now --aggressive
The BFG is typically hundreds of times faster than running git-filter-branch
on a big repo and the options are tailored around these two common use-cases:
Full disclosure: I'm the author of the BFG Repo-Cleaner.
I had accidentally stored large .jpa
backups of my site in git -
git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY_BIG_DIRECTORY_OR_FILE' --tag-name-filter cat -- --all
Relpace MY_BIG_DIRECTORY_OR_FILE
with the folder in question to completely rewrite your history, including tags.
source:
http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history
You need to run David Underhill's script on each branch in the repository to ensure the references are removed from all branches.
Then, as in the further discussion, initialize a new repository with git init
and either git pull
from the original or git remote add origin <original>
and then pull all branches.
$ du -sh ./BIG
299M ./BIG
$ cd BIG
$ git checkout master
$ git-remove-history REMOVE_ME
....
$ git checkout branch2
$ git-remove-history REMOVE_ME
...
$ cd ../SMALL
$ git init
$ git remote add origin ../BIG
$ git fetch --all
$ git checkout master
$ cd ..
$ du -sh ./SMALL ./BIG
26M ./SMALL
244M ./BIG