I have a 33 MB large file where I want to permanently delete the oldest revisions of that file, so I only the latest X revisions are kept around. How to do it?
My bare r
I think you are on the right track with the git filter-branch
command you tried. The problem is you haven't told it to keep the file in any commits, so it is removed from all of them. Now, I don't think there is a way to directly tell git-filter-branch
to skip any commits. However, since the commands are run in a shell context, it shouldn't be too difficult to use the shell to remove all but the last X number of revisions. Something like this:
KEEP=10 I=0 NUM_COMMITS=$(git rev-list master | wc -l) \
git filter-branch --index-filter \
'if [[ ${I} -lt $((NUM_COMMITS - KEEP)) ]]; then
git rm --cached --ignore-unmatch big_manual.txt;
fi;
I=$((I + 1))'
That would keep big_manual.txt
in the last 10 commits.
That being said, like Charles has mentioned, I'm not sure this is the best approach, since you're in effect undoing the whole point of VCS by deleting old versions.
Have you already tried optimizing the git repository with git-gc
and/or git-repack
? If not, those might be worth a try.
Note: this answer is about shortening history of a whole project, rather than removing single file from older history what the question was about!
The simplest way to shorten history of a whole project by using git filter-branch would be to use grafts mechanism (see repository layout documentation) to shorten history:
$ echo "$commit_id" >> .git/info/grafts
where $commit_id
is a commit that you want to be a root (first commit) of a new repository. Check out using "git log" or graphical history viewer such as gitk that the history looks like you want, and run "git filter-branch --all"; the use of grafts is described in git-filter-branch documentation.
Or you can use shallow clone by using --depth <depth>
option of git clone.
You can make use of grafts to remove part history of a single file (what was originally requested) using steps describe below. This solution consists of more steps than solution proposed by Dan Moulding, but each of steps is simpler, and you can check intermediate steps using "git log" or graphical history viewer.
First, select point where you want to have file removed, and mark those commits by creating branches at those points. For example if you want to have file appear for first time in commit f020285b
and have it removed in all it ancestors, mark it ancestor (assuming this is ordinary, non-merge commit) using
$ git branch cleanup f020285b^
Second, remove the file from the history beginning with cleanup
(i.e. f020285b^
) using git-filter-branch, as shown in "Examples" section of git-filter-branch manpage:
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_manual.txt' cleanup
If you want to remove also all commits which had changed only to removed file you can additionally use --prune-empty
option to git-filter-branch.
Next, join rewritten part of history with the rest of history using grafts mechanism:
$ echo $(git-rev-parse f020285b) $(git rev-parse cleanup) >> .git/info/grafts
Then you can examine histry to check if it is joined correctly.
Last, make grafts permanent (this would make all grafts permanent, but lets assume here that you don't use grafts otherwise) using git-filter-branch,
$ git filter-branch cleanup..HEAD
and remove grafts (as they are not needed any more), and the cleanup
branch
$ rm .git/info/grafts
$ git branch -d cleanup
Final note: if you remove part of history of some file, you better make sure that project without this file makes sense (and for example compiles correctly).
You might want to consider using git submodules. That way you can keep the images and other big files in another git repository, and the repository that has the source codes can refer to a particular revision of that other repository.
That will help you to keep the repository revisions in sync, because the parent repository contains a link to a particular sub repository revision. It will also let you to remove/rebase old revisions in the sub repository, without affecting the parent repository where your source code is - the removals of old revisions in a sub repository will not mess up the history of the parent repository, because you just update that to which revision the sub repository link in the parent repository points to.