How can I resume a git history rewrite?

前端 未结 2 1441
耶瑟儿~
耶瑟儿~ 2020-12-18 06:07

I\'m rewriting the history of a fairly big repo using git filter-branch --tree-filter and it\'s taking a few hours. I see that git is using a temporary director

相关标签:
2条回答
  • 2020-12-18 06:20

    git filter-branch doesn't itself support a suspend/resume pattern of use - although it writes temporary data out to a .git-rewrite folder, there's no actual support for resuming based on the contents of this directory. If you run git filter-branch on a repository that's had a previously aborted filter-branch operation, it'll either ask you to delete that temp folder, or, with the --force option, do it itself.

    The underlying problem is that git-filter-branch is slow running on big repos - if the process was much faster, there'd be no motivation to attempt a resume. So you've got a few options:

    Make git-filter-branch go a bit faster...

    • use a RAM-disk - git-filter-branch is very IO-intensive, and will run faster with your repository sitting in RAM.
    • use --index-filter rather than --tree-filter - it's similar to tree filter but doesn't check out the file-tree, which makes it faster, but does require you to rewrite your file alterations in terms of git index commands.
    • use cloud computing and hire a machine with fast ram and high clock-speed (don't bother with multiple cores unless your own commands are multi-threaded, as git-filter-branch itself is single-threaded)

    ...or use The BFG (way faster)

    The BFG Repo-Cleaner is a simpler, faster alternative to git-filter-branch - on large repos it's 50-150x faster. That turns your job that takes several hours into one that takes just a few minutes.

    Full disclosure: I'm the author of the BFG Repo-Cleaner.

    0 讨论(0)
  • 2020-12-18 06:27

    Roberto mentioned this in his answer, but I want to give a benchmark for it: If your git filter-branch operation is taking to long to complete, consider an AWS high memory instance.

    I once had to filter-branch and merge together 35 different repositories, each with two years of dozens-of-commits-per-day history. My script failed to complete in 25 hours on my laptop. It completed in 45 minutes on an m2.4xlarge instance in Amazon.

    Total cost?

    $1.64 -- less than I spend on a 20oz soda.

    BFG sounds like a great tool and I'd encourage anyone who routinely rewrites history to try it out. But if you just need something to work and have easy access to AWS, filter-branch is trivially easy.

    In 2016 this is even cheaper. Just mosey on over to the Spot Advisor and find yourself something of the "cluster compute for $0.30 / hour variety.

    0 讨论(0)
提交回复
热议问题