I have a fairly large Git repository with 1000s of commits, originally imported from SVN. Before I make my repo public, I\'d like to clean up a few hundred commit messages t
You can use git rebase -i
and replace pick
with reword
(or just r
). Then git rebasing stops on every commit giving you a chance to edit the message.
The only disadvantages are that you don't see all messages at once and that you can't go back when you spot an error.
A great and simple way to do this would be to use git filter-branch --msg-filter ""
with a python script.
The python script would look something like this:
import os
import sys
import re
pattern = re.compile("(?i)Issue-\d{1,4}")
commit_id = os.environ["GIT_COMMIT"]
message = sys.stdin.read()
if len(message) > 0:
if pattern.search(message):
message = pattern_conn1.sub("Issue",message)
print message
The command line call you would make is git filter-branch -f --msg-filter "python /path/to/git-script.py"
This is easy to do as follows:
Export all commits into text:
git format-patch -10000
Number should be more than total commits. This will create lots of files named NNNNN-commit-description.patch
.
Import all edited commits back:
git am *.patch
This will work only with single branch, but it works very well.
That's an old question but as there is no mention of git filter-branch
, I just add my two cents.
I recently had to mass-replace text in commit message, replacing a block of text by another without changing the rest of the commit messages. For instance, I had to replace Refs: #xxxxx with Refs: #22917.
I used git filter-branch
like this
git filter-branch --msg-filter 'sed "s/Refs: #xxxxx/Refs: #22917/g"' master..my_branch
--msg-filter
to edit only the commit message but you can use other filters to change files, edit full commit infos, etc.filter-branch
by applying it only to the commits that were not in master (master..my_branch
) but you can apply it on your whole branch by omitting the range of commits.As suggested in the doc, try this on a copy of your branch. Hope that helps.
Sources used for the answer
git-filter-repo https://github.com/newren/git-filter-repo is now recommend. I used it like:
PS C:\repository> git filter-repo --commit-callback '
>> msg = commit.message.decode(\"utf-8\")
>> newmsg = msg.replace(\"old string\", \"new string\")
>> commit.message = newmsg.encode(\"utf-8\")
>> ' --force
New history written in 328.30 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 087f91945a blah blah
Enumerating objects: 346091, done.
Counting objects: 100% (346091/346091), done.
Delta compression using up to 8 threads
Compressing objects: 100% (82068/82068), done.
Writing objects: 100% (346091/346091), done.
Total 346091 (delta 259364), reused 346030 (delta 259303), pack-reused 0
Completely finished after 443.37 seconds.
PS C:\repository>
you probably don't want to copy the powershell extra things, so here is just the command:
git filter-repo --commit-callback '
msg = commit.message.decode(\"utf-8\")
newmsg = msg.replace(\"old string\", \"new string\")
commit.message = newmsg.encode(\"utf-8\")
' --force
If you want to hit all the branches don't use --refs HEAD
. If you don't want to use --force
you can run it on a clean git clone --no-checkout
. This got me started: https://blog.kawzeg.com/2019/12/19/git-filter-repo.html
As alternative, consider skipping the import of the whole repository. I would simply checkout, clean up and commit important points in the history.