Update a development team with rewritten Git repo history, removing big files

后端 未结 4 1103
盖世英雄少女心
盖世英雄少女心 2020-12-01 05:29

I have a git repo with some very large binaries in it. I no longer need them, and I don\'t care about being able to checkout the files from earlier commits. So, to reduce th

相关标签:
4条回答
  • 2020-12-01 05:48

    If you don't make your developers re-clone it's likely that they will manage to drag the large files back in. For example, if they carefully splice onto the new history you will create and then happen to git merge from a local project branch that was not rebased, the parents of the merge commit will include the project branch which ultimately points at the entire history you erased with git filter-branch.

    0 讨论(0)
  • 2020-12-01 05:52

    Yes, your solution will work. You also have another option: instead of doing this on the central repo, run the filter on your clone and then push it back with git push --force --all. This will force the server to accept the new branches from your repository. This replaces step 2 only; the other steps will be the same.

    If your developers are pretty Git-savvy, then they might not have to delete their old copies; for example, they could fetch the new remotes and rebase their topic branches as appropriate.

    0 讨论(0)
  • 2020-12-01 06:01

    Your solution is not complete. You should include --tag-name-filter cat as an argument to filter branch so that the tags that contain the large files are changed as well. You should also modify all refs instead of just HEAD since the commit could be in multiple branches.

    Here is some better code:

    git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_1.zip big_2.zip etc.zip' --tag-name-filter cat -- --all
    

    Github has a good guide: https://help.github.com/articles/remove-sensitive-data

    0 讨论(0)
  • 2020-12-01 06:10

    Your plan is good (though it would be better to perform the filtering on a bare clone of your repository, rather than on the central server), but in preference to git-filter-branch you should use my BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch designed specifically for removing large files from Git repos.

    Download the Java jar (requires Java 6 or above) and run this command:

    $ java -jar bfg.jar  --strip-blobs-bigger-than 1MB  my-repo.git
    

    Any blob over 1MB in size (that isn't in your latest commit) will be totally removed from your repository's history. You can then use git gc to clean away the dead data:

    $ git gc --prune=now --aggressive
    

    The BFG is typically 10-50x faster than running git-filter-branch and the options are tailored around these two common use-cases:

    • Removing Crazy Big Files
    • Removing Passwords, Credentials & other Private data
    0 讨论(0)
提交回复
热议问题