Remove sensitive files and their commits from Git history

前端 未结 11 2226
借酒劲吻你
借酒劲吻你 2020-11-21 04:36

I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).

I kno

相关标签:
11条回答
  • 2020-11-21 05:12

    Use filter-branch:

    git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch *file_path_relative_to_git_repo*' --prune-empty --tag-name-filter cat -- --all
    
    git push origin *branch_name* -f
    
    0 讨论(0)
  • 2020-11-21 05:15

    So, It looks something like this:

    git rm --cached /config/deploy.rb
    echo /config/deploy.rb >> .gitignore
    

    Remove cache for tracked file from git and add that file to .gitignore list

    0 讨论(0)
  • 2020-11-21 05:16

    In my android project I had admob_keys.xml as separated xml file in app/src/main/res/values/ folder. To remove this sensitive file I used below script and worked perfectly.

    git filter-branch --force --index-filter \
    'git rm --cached --ignore-unmatch  app/src/main/res/values/admob_keys.xml' \
    --prune-empty --tag-name-filter cat -- --all
    
    0 讨论(0)
  • 2020-11-21 05:17

    For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.


    With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:

    Note for Windows users: use double quotes (") instead of singles in this command

    git filter-branch --index-filter \
    'git update-index --remove PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' <introduction-revision-sha1>..HEAD
    git push --force --verbose --dry-run
    git push --force
    

    Update 2019:

    This is the current code from the FAQ:

      git filter-branch --force --index-filter \
      "git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA" \
      --prune-empty --tag-name-filter cat -- --all
      git push --force --verbose --dry-run
      git push --force
    

    Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the changes can't be applied because it's not a fast-forward.

    To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.

    Tip: Execute git rebase --interactive


    In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:

    git commit -a --amend
    

    That will amend the previous commit with any new changes you've made, including entire file removals done with a git rm. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:

    git rebase -i origin/master
    

    That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:

    $EDITOR file-to-fix
    git commit -a --amend
    git rebase --continue
    

    For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.

    0 讨论(0)
  • 2020-11-21 05:18

    If you pushed to GitHub, force pushing is not enough, delete the repository or contact support

    Even if you force push one second afterwards, it is not enough as explained below.

    The only valid courses of action are:

    • is what leaked a changeable credential like a password?

      • yes: modify your passwords immediately, and consider using more OAuth and API keys!

      • no (naked pics):

        • do you care if all issues in the repository get nuked?

          • no: delete the repository

          • yes:

            • contact support
            • if the leak is very critical to you, to the point that you are willing to get some repository downtime to make it less likely to leak, make it private while you wait for GitHub support to reply to you

    Force pushing a second later is not enough because:

    • GitHub keeps dangling commits for a long time.

      GitHub staff does have the power to delete such dangling commits if you contact them however.

      I experienced this first hand when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a gc. Pull requests that contain the data have to be deleted however: that repo data remained accessible up to one year after initial takedown due to this.

      Dangling commits can be seen either through:

      • the commit web UI: https://github.com/cirosantilli/test-dangling/commit/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine)
      • the API: https://api.github.com/repos/cirosantilli/test-dangling/commits/53df36c09f092bbb59f2faa34eba15cd89ef8e83 (Wayback machine)

      One convenient way to get the source at that commit then is to use the download zip method, which can accept any reference, e.g.: https://github.com/cirosantilli/myrepo/archive/SHA.zip

    • It is possible to fetch the missing SHAs either by:

      • listing API events with type": "PushEvent". E.g. mine: https://api.github.com/users/cirosantilli/events/public (Wayback machine)
      • more conveniently sometimes, by looking at the SHAs of pull requests that attempted to remove the content
    • There are scrappers like http://ghtorrent.org/ and https://www.githubarchive.org/ that regularly pool GitHub data and store it elsewhere.

      I could not find if they scrape the actual commit diff, and that is unlikely because there would be too much data, but it is technically possible, and the NSA and friends likely have filters to archive only stuff linked to people or commits of interest.

    If you delete the repository instead of just force pushing however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.

    To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and did:

    git init
    git remote add origin git@github.com:cirosantilli/test-dangling.git
    
    touch a
    git add .
    git commit -m 0
    git push
    
    touch b
    git add .
    git commit -m 1
    git push
    
    touch c
    git rm b
    git add .
    git commit --amend --no-edit
    git push -f
    

    See also: How to remove a dangling commit from GitHub?

    git filter-repo is now officially recommended over git filter-branch

    This is mentioned in the manpage of git filter-branch in Git 2.5 itself.

    With git filter repo, you could either remove certain files with: Remove folder and its contents from git/GitHub's history

    pip install git-filter-repo
    git filter-repo --path path/to/remove1 --path path/to/remove2 --invert-paths
    

    This automatically removes empty commits.

    Or you can replace certain strings with: How to replace a string in a whole Git history?

    git filter-repo --replace-text <(echo 'my_password==>xxxxxxxx')
    
    0 讨论(0)
  • 2020-11-21 05:20

    You can use git forget-blob.

    The usage is pretty simple git forget-blob file-to-forget. You can get more info here

    https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

    It will disappear from all the commits in your history, reflog, tags and so on

    I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.

    Credits to contributors from Stack Overflow that allowed me to put this together

    0 讨论(0)
提交回复
热议问题