问题
I\'ve always used an interface based git client (smartGit) and thus don\'t have much experience with the git console.
However, I now face the need to substitute a string in all .txt files from history (so, not erasing the whole file but just substituting a string). I found the following command:
git filter-branch --tree-filter \'git ls-files -z \"*.php\" |xargs -0 perl -p -i -e \"s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g\"\' -- --all
I tried this, and unfortunately noticed that while the password did get changed, all binary files got corrupted. Images, etc. would all be corrupted.
Is there a better way to do this that won\'t corrupt my binary files?
Thanks.
EDIT:
I got mixed up with something. The actual code that caused binary files to get corrupted was:
$ git filter-branch --tree-filter \"find . -type f -exec sed -i -e \'s/originalpassword/newpassword/g\' {} \\;\"
The code at the top actually removed all files with my password strangely enough.
回答1:
You can avoid touching undesired files by passing -name "pattern"
to find
.
This works for me:
git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
's/originalpassword/newpassword/g' {} \;"
回答2:
I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch
specifically designed for rewriting files from Git history.
You should carefully follow these steps here: https://rtyley.github.io/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 7 or above) and run this command:
$ java -jar bfg.jar --replace-text replacements.txt -fi *.php my-repo.git
The replacements.txt
file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass # replace with 'examplePass' instead
PASSWORD3==> # replace with the empty string
regex:password=\w+==>password= # Replace, using a regex
regex:\r(\n)==>$1 # Replace Windows newlines with Unix newlines
Your entire repository history will be scanned, and .php
files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
回答3:
I created a file at /usr/local/git/findsed.sh , with the following contents:
find . -name 'githubDirToSubmodule.sh' -exec sed -i '' -e 's/What I want to remove//g' {} \;
I ran the command:
git filter-branch --tree-filter "sh /usr/local/git/findsed.sh"
Explanation of commands
When you run git filter-branch, this goes through each revision that you ever committed, one by one. --tree-filter runs the findsed.sh script on each committed revision, saves it, then progresses to the next revision.
The find command finds a specific file or set of files and executes (-exec) the sed editor on that file. sed is a command that takes the regex after s/ and replaces it with the string between / and /g (blank in my example). {} is a reference to the files path that was given by the find command. The file path is fed to sed, so that sed knows what to work on. \; just ends the -exec command.
Seperating the shell script and command out into seperate pieces allows for less complication when it comes to quotes '' or "".
Peculiarities
I successfully implemented this on a mac, and apparently sed is a particular (older?) version on macs. This matters, as it sometimes behaves differently. Make sure to do sed -i '' or else it was adding a "-e" to the end of files, thinking that that was what i wanted to name my backup files. -i '' says dont make backup files, just edit the files in place and no backup file needed.
Specifying -name 'filename.sh' helped me avoid another issue that I could not solve. There was another file with .sh and that file ended without a newline character. sed for some reason, would add a newline character to the end, despite the 's/blah/blah/g' not matching anything in that file. So instead of figuring out that issue, I just told the find to ignore all other files.
Additional commands that work
Additionally, I found these commands to work in the findsed.sh file (only one command at a time, not multple, so comment # the others out):
find . -name '.publishNewZenPackFromGithub.sh.swp' -exec rm -f {} \;
find . -name '*' -exec grep -H PassToRemove {} \;
Enjoy!
回答4:
Could be a shell expansion issue. If filter-branch is losing the quotes around "*.php"
by the time it evaluates the command, it may be expanding to nothing, thus git ls-files -z
listing all files.
You could check the filter-branch source or trying different quoting tricks, but what I'd do is just make a one-line shell script that does your tree-filter and pass that script instead.
回答5:
With Git 2.24 (Q4 2019), git filter-branch (and BFG) is deprecated.
The equivalent would be, using newren/git-filter-repo, and its example section:
cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt
with expressions.txt
:
literal:originalpassword=>newpassword
来源:https://stackoverflow.com/questions/4110652/how-to-substitute-text-from-files-in-git-history