git-filter-branch to remove strings, but where strings contain $ ' \\ and other characters

假如想象 提交于 2019-12-01 01:21:58

try the BFG instead of git filter-branch...

You can use a much more friendly substitution format if you use The BFG rather than git-filter-branch. Create a passwords.txt file, with one password per line like this:

PASSWORD1==>xXxXx      # Replace literal string 'PASSWORD1' with 'xXxXx'
ezxcdf\fr$sdd%==>xXxXx # ...all text is matched as a *literal* string by default

Then run the BFG with this command:

$ java -jar bfg.jar -fi '*.php' --replace-text passwords.txt  my-repo.git

Your entire repository history will be scanned, and all .php files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.

...no escaping needed

Note that the only bit of parsing the BFG does with here with the substitution file is to split on the '==>' string - which probably isn't in your passwords - and all text is interpreted literally by default.

If you want to be even more concise, you can drop the '==>' and everything that comes after it on each line (ie, just have a file of passwords) and The BFG will replace each password with the string '***REMOVED***' by default.

The BFG is typically hundreds of times faster than running git-filter-branch on a big repo and the options are tailored around these two common use-cases:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

Full disclosure: I'm the author of the BFG Repo-Cleaner.

fooquency

Building on the brilliant help given by konsolebox which really helped me solve this, the solution I ended up using in terms of doing it via the shell was:

Define the strings in a file, strings.txt

string1
another$string
yet! @nother string
some more stuff to re\move

Create a Perl script perl-escape-strings.pl which will be used to escape the strings, where xXxXxXxXxXx is the string they will all be replaced with

#!/usr/bin/perl

use strict;
use warnings;

while (<>)
{
        chomp;
        my $passwd = quotemeta($_);
        print qq|s/$passwd/xXxXxXxXxXx/g;\n|;
}

exit 0;

Bash script:

# Pre-process the strings
./perl-escape-strings.pl strings.txt > strings-perl-escaped.txt

# Change directory to the repo
cd repo/

# Define the filter command
FILTER="git ls-files -z '*.html' '*.php' | xargs -0 perl -p -i ../strings-perl-escaped.txt"

# Run the filter
git filter-branch --tree-filter "$FILTER" -- --all

However, because the number of strings is large, and my repository is large and with many thousand commits, the filter-branch method is taking a long time. So I'm going to try The BFG mentioned in another answer also in parallel, to see if it completes quicker.

Using a wrapper script:

#!/bin/bash

readarray -t PASSWORDS < list_file

REPLACEMENT='xXxXxXxXxXx'
SEP=$'\xFF'

EXPR=${PASSWORDS[0]}
for (( I = 1; I < ${#PASSWORDS[@]}; ++I )); do
    EXPR+="|${PASSWORDS[I]}"
done
EXPR="s${SEP}(${EXPR})${SEP}$REPLACEMENT${SEP}g"
EXPR=${EXPR//'\'/'\\\\'}; EXPR=${EXPR//'$'/'\\\$'}
EXPR=${EXPR//'"'/'\"'};   EXPR=${EXPR//'`','\`'}
EXPR=${EXPR//'^','\\^'};  EXPR=${EXPR//'[','\\['}
EXPR=${EXPR//']','\\]'};  EXPR=${EXPR//'+','\\+'}
EXPR=${EXPR//'?','\\?'};  EXPR=${EXPR//'.','\\.'}
EXPR=${EXPR//'*','\\*'};  EXPR=${EXPR//'{','\\{'}
EXPR=${EXPR//'}','\\}'};  EXPR=${EXPR//'(','\\('}
EXPR=${EXPR//')','\\)'}

FILTER="git ls-files -z '*.php' | xargs -0 perl -p -i -e \"$EXPR\""

echo "Number of passwords: ${#PASSWORDS[@]}"    
echo "Passwords:" "${PASSWORDS[@]}"
echo "EXPR: $EXPR"
echo "FILTER: $FILTER"

git filter-branch --tree-filter "$FILTER" -- --all

Build it from the inside out. Say the password is

a$b'c\d

The regex pattern would be

a\$b'c\\d

One possibility for the perl command would be

perl -i -pe's/a\$b'\''c\\d/.../g'

(Note how each ' was replaced with '\''.)

Now you need to include that in single quotes, so you repeat the process.

... '... perl -i -pe'\''s/a\$b'\''\'\'''\''c\\d/.../g'\''' ...
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!