I want to make a rebase to remove a certain commit from my history. I know how to do that. However if I do it, the commit timestamp is set to the moment I completed the rebase.
So, here is a tedious way to do it (depending on how many commits you need to rebase), but I tried it out and it works. When you do an interactive rebase, mark each commit with "e" so that you can edit it. This will cause git to pause after every commit. At each pause, you can specify which date to use and continue to the next commit with:
GIT_COMMITTER_DATE="Wed Feb 16 14:00 2011 +0100" git commit --amend
git rebase --continue
This is, of course, a major pain in the rear, and you have to know all of the commit dates before hand, but if you can't do it any other way, it at least should work.
Let's say this is the history around the commit you want to remove
... o - o - o - o ... ... o
^ ^ ^ ^
| | +- next |
| +- bad +-- master (HEAD)
start
where:
bad
is the commit you want to remove;start
is the parent of the commit you want to remove;next
is the next commit after bad
; it is good, you want to keep it and all the timeline after it; it will replace bad
after rebase.In order to be able to safely remove bad
, it's important that no other branch existing at the time when bad
was created was merged into the main timeline after bad
. I.e. by removing bad
and its connections with its parent and child commits from the history graph, you get two disconnected timeline pieces.
It is probably possible to remove bad
even if another existing branch was merged after bad
. I didn't check this situation but I expect some impediments because of the merge commit.
Each git
commit is identified by a hash that is computed using the commit's properties: content, message, author and committer date and email.
A rebase always changes the committer date. It can also change committer email, commit message and content too.
In order to restore the original committer dates after a rebase we need to save them together with some information that can identify each commit after the rebase.
Because you want to modify a commit, the commit contents change during the rebase. Adding or removing files or commits change the contents all future commits.
This leave us without a property that uniquely identifies the commits and does not change during the desired rebase. We can try to use two or more properties that do not change during the rebase.
The emails (author and committer) are of almost no use. If there is a single person that worked on the project, they are the same for all commits and cannot be used. The properties that remains (are different on most commits, are not affected by the rebase) are author date and commit message (the first line).
If the pair (author date, commit message) provides unique values for all the commits affected by the rebase then we can restore the commit dates afterwards without errors.
There is a simple way to verify if the (author date, commit message) pairs are unique for the affected commits.
Run the following two commands:
$ git log --format="%aI %s" start...master | uniq | wc -l
$ git log --oneline start...master | wc -l
If they display the same number then you are lucky: the pair (author date, commit message) can be used to uniquely identify the commits. Read on.
If the numbers are different (the first command will always produce a number smaller than or equal to the one produced by the second command) then you are out of luck.
This command
$ git log --format="%H %cI %aI %s" start...master > /tmp/hashlist
extracts the commit hash, committer date (the payload), author date and commit message (the key) for all the commits starting with start
and stores them in a file.
While it is a common misconception that git
"rewrites history", in fact it just generates an alternative history line and decides it is the correct history. It does not change or remove the "rewritten" commits; they are still present for some time in its database and can be restored in case the operation fails.
We can proactively backup the current history line to easily restore it if needed. All we have to do is to create a new branch that points to master
. This way, when git rebase
moves master
to the new timeline, the old one is still accessible using the new branch.
$ git branch old_master
The command above creates a branch named old_master
that keeps the current timeline in focus until we complete all the changes and are satisfied with the new world order.
Removing the commit bad
from the history is as simple as:
$ git rebase --preserve-merges --onto start bad
The following command "rewrites" the history and changes the committer date using the values we saved before:
$ git filter-branch --env-filter 'export GIT_COMMITTER_DATE=$(fgrep -m 1 "$(git log -1 --format="%aI %s" $GIT_COMMIT)" /tmp/hashlist | cut -d" " -f2)' -f start...master
How it works:
git
walks the history between the commits labelled start
and master
and for each commit it runs the command provided as argument to --env-filter
before rewriting the commit. It sets the environment variable GIT_COMMIT
with the hash of the commit being rewritten.
Since we already did a rebase
that modified the hashes of all the commits we cannot use $GIT_COMMIT
directly to identify the original commit date of the commit (because $GIT_COMMIT
is a commit generated by git rebase
and we are not interested in their committer dates).
The command we provide to --env-filter
export GIT_COMMITTER_DATE=$(fgrep -m 1 "$(git log -1 --format="%aI %s" $GIT_COMMIT)" /tmp/hashlist | cut -d" " -f2)
runs git log -1 --format="%aI %s" $GIT_COMMIT
to generate the key pair (author date, commit message) discussed above. Its output is passed as argument to the command fgrep -m 1 "..." /tmp/hashlist | cut -d" " -f2
that finds the pair in the list of previously saved hashes (fgrep
) and extracts the original commit date from the saved line (cut
). Finally, the value of the commit date is stored in the environment variable GIT_COMMITTER_DATE
that is used by git
to rewrite the commit.
Using the git log
command again
$ git log --format="%cI %aI %s" start...master
you can verify that the rewritten history matches the original history. If you use a graphical git
client you can check the results easier by visual inspection. The branch old_master
keeps the old history line visible in the client and you can easily compare the dates of each commit of old_master
branch with the corresponding one of master
branch.
If something didn't go well or you need to modify the procedure you can easily start over by:
$ git reset --hard old_master
When you are satisfied by the result you can remove the backup branch and the file used to store the original commit dates:
$ git branch -D old_master
$ rm /tmp/hashlist
That's all!