I'm rewriting the history of a fairly big repo using git filter-branch --tree-filter
and it's taking a few hours. I see that git is using a temporary directory to store its intermediate work as it goes along. Does that mean it's possible to resume a rewrite if it gets interrupted? If so, how?
Edit
The operation I'm doing is moving a couple of directories. These are currently in subdirectories, but I now need them to be in the root.
e.g.
dir1
- dir2
- dir3
- dir4
becomes
dir1
- dir2
dir3
dir4
Of course my directory structure is a lot more complex than that, but that's the gist of what I'm trying to do.
git filter-branch
doesn't itself support a suspend/resume pattern of use - although it writes temporary data out to a .git-rewrite
folder, there's no actual support for resuming based on the contents of this directory. If you run git filter-branch
on a repository that's had a previously aborted filter-branch
operation, it'll either ask you to delete that temp folder, or, with the --force
option, do it itself.
The underlying problem is that git-filter-branch
is slow running on big repos - if the process was much faster, there'd be no motivation to attempt a resume. So you've got a few options:
Make git-filter-branch go a bit faster...
- use a RAM-disk -
git-filter-branch
is very IO-intensive, and will run faster with your repository sitting in RAM. - use
--index-filter
rather than--tree-filter
- it's similar to tree filter but doesn't check out the file-tree, which makes it faster, but does require you to rewrite your file alterations in terms of git index commands. - use cloud computing and hire a machine with fast ram and high clock-speed (don't bother with multiple cores unless your own commands are multi-threaded, as
git-filter-branch
itself is single-threaded)
...or use The BFG (way faster)
The BFG Repo-Cleaner is a simpler, faster alternative to git-filter-branch
- on large repos it's 50-150x faster. That turns your job that takes several hours into one that takes just a few minutes.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
Roberto mentioned this in his answer, but I want to give a benchmark for it: If your git filter-branch
operation is taking to long to complete, consider an AWS high memory instance.
I once had to filter-branch
and merge together 35 different repositories, each with two years of dozens-of-commits-per-day history. My script failed to complete in 25 hours on my laptop. It completed in 45 minutes on an m2.4xlarge
instance in Amazon.
Total cost?
$1.64 -- less than I spend on a 20oz soda.
BFG sounds like a great tool and I'd encourage anyone who routinely rewrites history to try it out. But if you just need something to work and have easy access to AWS, filter-branch
is trivially easy.
In 2016 this is even cheaper. Just mosey on over to the Spot Advisor and find yourself something of the "cluster compute for $0.30 / hour variety.
来源:https://stackoverflow.com/questions/16152407/how-can-i-resume-a-git-history-rewrite