git-filter-branch to remove strings, but where strings contain $ ' \ and other characters

后端 未结 4 2045
渐次进展
渐次进展 2021-01-07 09:12

I\'m trying to rewrite history, using:

git filter-branch --tree-filter \'git ls-files -z \"*.php\" |xargs -0 perl -p -i -e \"s#(PASSWORD1|PASSWORD2|PASSWORD3)#

相关标签:
4条回答
  • 2021-01-07 09:25

    Building on the brilliant help given by konsolebox which really helped me solve this, the solution I ended up using in terms of doing it via the shell was:

    Define the strings in a file, strings.txt

    string1
    another$string
    yet! @nother string
    some more stuff to re\move
    

    Create a Perl script perl-escape-strings.pl which will be used to escape the strings, where xXxXxXxXxXx is the string they will all be replaced with

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    while (<>)
    {
            chomp;
            my $passwd = quotemeta($_);
            print qq|s/$passwd/xXxXxXxXxXx/g;\n|;
    }
    
    exit 0;
    

    Bash script:

    # Pre-process the strings
    ./perl-escape-strings.pl strings.txt > strings-perl-escaped.txt
    
    # Change directory to the repo
    cd repo/
    
    # Define the filter command
    FILTER="git ls-files -z '*.html' '*.php' | xargs -0 perl -p -i ../strings-perl-escaped.txt"
    
    # Run the filter
    git filter-branch --tree-filter "$FILTER" -- --all
    

    However, because the number of strings is large, and my repository is large and with many thousand commits, the filter-branch method is taking a long time. So I'm going to try The BFG mentioned in another answer also in parallel, to see if it completes quicker.

    0 讨论(0)
  • 2021-01-07 09:30

    Using a wrapper script:

    #!/bin/bash
    
    readarray -t PASSWORDS < list_file
    
    REPLACEMENT='xXxXxXxXxXx'
    SEP=$'\xFF'
    
    EXPR=${PASSWORDS[0]}
    for (( I = 1; I < ${#PASSWORDS[@]}; ++I )); do
        EXPR+="|${PASSWORDS[I]}"
    done
    EXPR="s${SEP}(${EXPR})${SEP}$REPLACEMENT${SEP}g"
    EXPR=${EXPR//'\'/'\\\\'}; EXPR=${EXPR//'$'/'\\\$'}
    EXPR=${EXPR//'"'/'\"'};   EXPR=${EXPR//'`','\`'}
    EXPR=${EXPR//'^','\\^'};  EXPR=${EXPR//'[','\\['}
    EXPR=${EXPR//']','\\]'};  EXPR=${EXPR//'+','\\+'}
    EXPR=${EXPR//'?','\\?'};  EXPR=${EXPR//'.','\\.'}
    EXPR=${EXPR//'*','\\*'};  EXPR=${EXPR//'{','\\{'}
    EXPR=${EXPR//'}','\\}'};  EXPR=${EXPR//'(','\\('}
    EXPR=${EXPR//')','\\)'}
    
    FILTER="git ls-files -z '*.php' | xargs -0 perl -p -i -e \"$EXPR\""
    
    echo "Number of passwords: ${#PASSWORDS[@]}"    
    echo "Passwords:" "${PASSWORDS[@]}"
    echo "EXPR: $EXPR"
    echo "FILTER: $FILTER"
    
    git filter-branch --tree-filter "$FILTER" -- --all
    
    0 讨论(0)
  • 2021-01-07 09:38

    try the BFG instead of git filter-branch...

    You can use a much more friendly substitution format if you use The BFG rather than git-filter-branch. Create a passwords.txt file, with one password per line like this:

    PASSWORD1==>xXxXx      # Replace literal string 'PASSWORD1' with 'xXxXx'
    ezxcdf\fr$sdd%==>xXxXx # ...all text is matched as a *literal* string by default
    

    Then run the BFG with this command:

    $ java -jar bfg.jar -fi '*.php' --replace-text passwords.txt  my-repo.git
    

    Your entire repository history will be scanned, and all .php files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.

    ...no escaping needed

    Note that the only bit of parsing the BFG does with here with the substitution file is to split on the '==>' string - which probably isn't in your passwords - and all text is interpreted literally by default.

    If you want to be even more concise, you can drop the '==>' and everything that comes after it on each line (ie, just have a file of passwords) and The BFG will replace each password with the string '***REMOVED***' by default.

    The BFG is typically hundreds of times faster than running git-filter-branch on a big repo and the options are tailored around these two common use-cases:

    • Removing Crazy Big Files
    • Removing Passwords, Credentials & other Private data

    Full disclosure: I'm the author of the BFG Repo-Cleaner.

    0 讨论(0)
  • 2021-01-07 09:39

    Build it from the inside out. Say the password is

    a$b'c\d
    

    The regex pattern would be

    a\$b'c\\d
    

    One possibility for the perl command would be

    perl -i -pe's/a\$b'\''c\\d/.../g'
    

    (Note how each ' was replaced with '\''.)

    Now you need to include that in single quotes, so you repeat the process.

    ... '... perl -i -pe'\''s/a\$b'\''\'\'''\''c\\d/.../g'\''' ...
    
    0 讨论(0)
提交回复
热议问题