How to substitute text from files in git history?

。_饼干妹妹 提交于 2019-11-26 11:54:32

问题


I\'ve always used an interface based git client (smartGit) and thus don\'t have much experience with the git console.

However, I now face the need to substitute a string in all .txt files from history (so, not erasing the whole file but just substituting a string). I found the following command:

git filter-branch --tree-filter \'git ls-files -z \"*.php\" |xargs -0 perl -p -i -e \"s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g\"\' -- --all

I tried this, and unfortunately noticed that while the password did get changed, all binary files got corrupted. Images, etc. would all be corrupted.

Is there a better way to do this that won\'t corrupt my binary files?

Thanks.

EDIT:

I got mixed up with something. The actual code that caused binary files to get corrupted was:

$ git filter-branch --tree-filter \"find . -type f -exec sed -i -e \'s/originalpassword/newpassword/g\' {} \\;\"

The code at the top actually removed all files with my password strangely enough.


回答1:


You can avoid touching undesired files by passing -name "pattern" to find.

This works for me:

git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
    's/originalpassword/newpassword/g' {} \;"



回答2:


I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for rewriting files from Git history.

You should carefully follow these steps here: https://rtyley.github.io/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 7 or above) and run this command:

$ java -jar bfg.jar  --replace-text replacements.txt -fi *.php  my-repo.git

The replacements.txt file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):

PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass         # replace with 'examplePass' instead
PASSWORD3==>                    # replace with the empty string
regex:password=\w+==>password=  # Replace, using a regex
regex:\r(\n)==>$1               # Replace Windows newlines with Unix newlines

Your entire repository history will be scanned, and .php files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.

Full disclosure: I'm the author of the BFG Repo-Cleaner.




回答3:


I created a file at /usr/local/git/findsed.sh , with the following contents:

find . -name 'githubDirToSubmodule.sh' -exec sed -i '' -e 's/What I want to remove//g' {} \;

I ran the command:

git filter-branch --tree-filter "sh /usr/local/git/findsed.sh"

Explanation of commands

When you run git filter-branch, this goes through each revision that you ever committed, one by one. --tree-filter runs the findsed.sh script on each committed revision, saves it, then progresses to the next revision.

The find command finds a specific file or set of files and executes (-exec) the sed editor on that file. sed is a command that takes the regex after s/ and replaces it with the string between / and /g (blank in my example). {} is a reference to the files path that was given by the find command. The file path is fed to sed, so that sed knows what to work on. \; just ends the -exec command.

Seperating the shell script and command out into seperate pieces allows for less complication when it comes to quotes '' or "".

Peculiarities

I successfully implemented this on a mac, and apparently sed is a particular (older?) version on macs. This matters, as it sometimes behaves differently. Make sure to do sed -i '' or else it was adding a "-e" to the end of files, thinking that that was what i wanted to name my backup files. -i '' says dont make backup files, just edit the files in place and no backup file needed.

Specifying -name 'filename.sh' helped me avoid another issue that I could not solve. There was another file with .sh and that file ended without a newline character. sed for some reason, would add a newline character to the end, despite the 's/blah/blah/g' not matching anything in that file. So instead of figuring out that issue, I just told the find to ignore all other files.

Additional commands that work

Additionally, I found these commands to work in the findsed.sh file (only one command at a time, not multple, so comment # the others out):

find . -name '.publishNewZenPackFromGithub.sh.swp' -exec rm -f {} \;
find . -name '*' -exec grep -H PassToRemove {} \;

Enjoy!




回答4:


Could be a shell expansion issue. If filter-branch is losing the quotes around "*.php" by the time it evaluates the command, it may be expanding to nothing, thus git ls-files -z listing all files.

You could check the filter-branch source or trying different quoting tricks, but what I'd do is just make a one-line shell script that does your tree-filter and pass that script instead.




回答5:


With Git 2.24 (Q4 2019), git filter-branch (and BFG) is deprecated.

The equivalent would be, using newren/git-filter-repo, and its example section:

cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt

with expressions.txt:

literal:originalpassword=>newpassword


来源:https://stackoverflow.com/questions/4110652/how-to-substitute-text-from-files-in-git-history

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!