I'm trying to rewrite history, using:
git filter-branch --tree-filter 'git ls-files -z "*.php" |xargs -0 perl -p -i -e "s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g"' -- --all
as described in this tutorial.
However, the password strings I have contain all kinds of non- A-Z characters, e.g. $ ' and \, rather than being nice simple 'PASSWORD1' type strings in the example above.
Can someone explain what escaping I need? I've not been able to find this anywhere, and I've been battling with this for hours.
try the BFG instead of git filter-branch...
You can use a much more friendly substitution format if you use The BFG rather than git-filter-branch
. Create a passwords.txt
file, with one password per line like this:
PASSWORD1==>xXxXx # Replace literal string 'PASSWORD1' with 'xXxXx'
ezxcdf\fr$sdd%==>xXxXx # ...all text is matched as a *literal* string by default
Then run the BFG with this command:
$ java -jar bfg.jar -fi '*.php' --replace-text passwords.txt my-repo.git
Your entire repository history will be scanned, and all .php
files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.
...no escaping needed
Note that the only bit of parsing the BFG does with here with the substitution file is to split on the '==>
' string - which probably isn't in your passwords - and all text is interpreted literally by default.
If you want to be even more concise, you can drop the '==>
' and everything that comes after it on each line (ie, just have a file of passwords) and The BFG will replace each password with the string '***REMOVED***
' by default.
The BFG is typically hundreds of times faster than running git-filter-branch
on a big repo and the options are tailored around these two common use-cases:
- Removing Crazy Big Files
- Removing Passwords, Credentials & other Private data
Full disclosure: I'm the author of the BFG Repo-Cleaner.
Building on the brilliant help given by konsolebox which really helped me solve this, the solution I ended up using in terms of doing it via the shell was:
Define the strings in a file, strings.txt
string1
another$string
yet! @nother string
some more stuff to re\move
Create a Perl script perl-escape-strings.pl
which will be used to escape the strings, where xXxXxXxXxXx is the string they will all be replaced with
#!/usr/bin/perl
use strict;
use warnings;
while (<>)
{
chomp;
my $passwd = quotemeta($_);
print qq|s/$passwd/xXxXxXxXxXx/g;\n|;
}
exit 0;
Bash script:
# Pre-process the strings
./perl-escape-strings.pl strings.txt > strings-perl-escaped.txt
# Change directory to the repo
cd repo/
# Define the filter command
FILTER="git ls-files -z '*.html' '*.php' | xargs -0 perl -p -i ../strings-perl-escaped.txt"
# Run the filter
git filter-branch --tree-filter "$FILTER" -- --all
However, because the number of strings is large, and my repository is large and with many thousand commits, the filter-branch method is taking a long time. So I'm going to try The BFG mentioned in another answer also in parallel, to see if it completes quicker.
Using a wrapper script:
#!/bin/bash
readarray -t PASSWORDS < list_file
REPLACEMENT='xXxXxXxXxXx'
SEP=$'\xFF'
EXPR=${PASSWORDS[0]}
for (( I = 1; I < ${#PASSWORDS[@]}; ++I )); do
EXPR+="|${PASSWORDS[I]}"
done
EXPR="s${SEP}(${EXPR})${SEP}$REPLACEMENT${SEP}g"
EXPR=${EXPR//'\'/'\\\\'}; EXPR=${EXPR//'$'/'\\\$'}
EXPR=${EXPR//'"'/'\"'}; EXPR=${EXPR//'`','\`'}
EXPR=${EXPR//'^','\\^'}; EXPR=${EXPR//'[','\\['}
EXPR=${EXPR//']','\\]'}; EXPR=${EXPR//'+','\\+'}
EXPR=${EXPR//'?','\\?'}; EXPR=${EXPR//'.','\\.'}
EXPR=${EXPR//'*','\\*'}; EXPR=${EXPR//'{','\\{'}
EXPR=${EXPR//'}','\\}'}; EXPR=${EXPR//'(','\\('}
EXPR=${EXPR//')','\\)'}
FILTER="git ls-files -z '*.php' | xargs -0 perl -p -i -e \"$EXPR\""
echo "Number of passwords: ${#PASSWORDS[@]}"
echo "Passwords:" "${PASSWORDS[@]}"
echo "EXPR: $EXPR"
echo "FILTER: $FILTER"
git filter-branch --tree-filter "$FILTER" -- --all
Build it from the inside out. Say the password is
a$b'c\d
The regex pattern would be
a\$b'c\\d
One possibility for the perl
command would be
perl -i -pe's/a\$b'\''c\\d/.../g'
(Note how each '
was replaced with '\''
.)
Now you need to include that in single quotes, so you repeat the process.
... '... perl -i -pe'\''s/a\$b'\''\'\'''\''c\\d/.../g'\''' ...
来源:https://stackoverflow.com/questions/18647400/git-filter-branch-to-remove-strings-but-where-strings-contain-and-other-c