We\'re running a central git repository (gforge) that everyone pulls from and pushes to. Unfortunately, some inept co-workers have decided that pushing several 10-100Mb jar file
Check this out https://help.github.com/articles/remove-sensitive-data . Here they write about removing sensitive data from your Git repository but you can very well use it for removing the large files from your commits.
Use filter-branch!
git filter-branch --tree-filter 'find . -name "*.jar" -exec rm {} \;'
Then just purge all the commits that don't have any files in them with:
git filter-branch -f --prune-empty -- --all
GForge guy here. Even thought this is primarily a git question, I'd like to offer two things:
In addition to the other answers, you may want to consider adding some pre-emptive protection against future giant jar files, in the form of a pre-receive hook in the repo that forbids users (or at least "non-admin users") from pushing very large files, or files named *.jar
, or whatever seems best.
We've done this sort of thing before, including forbidding specific commit IDs because of certain users who just couldn't get the hang of "save your work on a temp branch, reset and pull, and re-apply your work, minus the giant file".
Note that the pre-receive hook runs in a rather interesting context: the files have actually been uploaded, it's just that the references (usually branch heads) have not actually changed yet. You can prevent the branch heads from changing but you'll still be using (temporary, until gc'ed) disk space and network bandwidth.
The easiest way to avoid chaos is to give the server more disk.
This is a tough one. Removing the files requires removing them from the history, too, which can only be done with git filter-branch
. This command, for example, would remove <file>
from the history:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch <file>' \
--prune-empty --tag-name-filter cat -- --all
The problem is this rewrites SHA1 hashes, meaning everyone on the team will need to reset to the new branch version or risk some serious headache. That's all fine and good if no one has work in progress and you all use topic branches. If you're more centralized, your team is large, or many of them keep dirty working directories while they work, there's no way to do this without a little bit of chaos and discord. You could spend quite a while getting everyone's local working correctly. That written, git filter-branch
is probably the best solution. Just make sure you've got a plan, your team understands it, and you make sure they back up their local repositories in case some vital work in progress gets lost or munged.
One possible plan would be:
git diff > ~/my_wip
.git format-patch <branch>
git filter-branch
. Make sure the team knows not to pull while this is happening.git fetch && git reset --hard origin/<branch>
or have them clone the repository afresh.git am <patch>
.git apply
, e.g. git apply ~/my_wip
.