Remove file from git repository (history)

丶灬走出姿态 提交于 2019-11-26 14:07:54

I can't say for sure without access to your repository data, but I believe there are probably one or more packed refs still referencing old commits from before you ran git filter-branch. This would explain why git fsck --full --unreachable doesn't call the large blob an unreachable object, even though you've expired your reflog and removed the original (unpacked) refs.

Here's what I'd do (after git filter-branch and git gc have been done):

1) Make sure original refs are gone:

rm -rf .git/refs/original

2) Expire all reflog entries:

git reflog expire --all --expire='0 days'

3) Check for old packed refs

This could potentially be tricky, depending on how many packed refs you have. I don't know of any Git commands that automate this, so I think you'll have to do this manually. Make a backup of .git/packed-refs. Now edit .git/packed-refs. Check for old refs (in particular, see if it packed any of the refs from .git/refs/original). If you find any old ones that don't need to be there, delete them (remove the line for that ref).

After you finish cleaning up the packed-refs file, see if git fsck notices the unreachable objects:

git fsck --full --unreachable

If that worked, and git fsck now reports your large blob as unreachable, you can move on to the next step.

4) Repack your packed archive(s)

git repack -A -d

This will ensure that the unreachable objects get unpacked and stay unpacked.

5) Prune loose (unreachable) objects

git prune

And that should do it. Git really should have a better way to manage packed refs. Maybe there is a better way that I don't know about. In the absence of a better way, manual editing of the packed-refs file might be the only way to go.

I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for rewriting files from Git history. One way in which it makes your life easier here is that it actually handles all references by default (all tags, branches, stuff like refs/remotes/origin/master, etc) but it's also 10-50x faster.

You should carefully follow these steps here: http://rtyley.github.com/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 6 or above) and run this command:

$ java -jar bfg.jar  --delete-files file_name  my-repo.git

Any file named file_name (that isn't in your latest commit) will be will be totally removed from your repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive

The BFG is generally much simpler to use than git-filter-branch - the options are tailored around these two common use-cases:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

Full disclosure: I'm the author of the BFG Repo-Cleaner.

I found this to be quite helpful with regards to removing a whole folder as the above didn't really help me: https://help.github.com/articles/remove-sensitive-data.

I used:

git filter-branch -f --force \
--index-filter 'git rm -rf --cached --ignore-unmatch folder/sub-folder' \
--prune-empty --tag-name-filter cat -- --all

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

I was trying to get rid of a big file in the history, and the above answers worked, up to a point. The point is: they don't work if you have tags. If the commit containing the big file is reachable from a tag, then you would need to adjust the filter-branches command thusly:

git filter-branch --tag-name-filter cat \
--index-filter 'git rm --cached --ignore-unmatch huge_file_name' -- \
--all --tags
Wayne Conrad

See: How do I remove sensitive files from git’s history

The above will fail if the file does not exist in a rev. In that case, the '--ignore-unmatch' switch will fix it:

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch <filename>' HEAD

Then, to get all loose objects out of the repostiry:

git gc --prune='0 days ago'
VonC

You have various reasons for a still large git repo size after git gc, since it does not remove all loose objects.

I detail those reasons in "reduce the git repository size"

But one trick to test in your case would be to clone your "cleaned" Git repo and see if the clone has the appropriate size.

(' "cleaned" repo ' being the one where you did apply the filter-branch, and then gc and prune)

This should be covered by the git obliterate command in Git Extras (https://github.com/visionmedia/git-extras).

git obliterate <filename>

I had the same problem and I found a great tutorial on github that explain step by step how to get rid of files you accidentally committed.

Here is a little summary of the procedure as Cupcake suggested.

If you have a file named file_to_remove to remove from the history :

cd path_to_parent_dir

git filter-branch --force --index-filter \
  'git rm --cached --ignore-unmatch file_to_remove' \
  --prune-empty --tag-name-filter cat -- --all
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!