How to sync local history after massive git history rewrite?

混江龙づ霸主 提交于 2019-12-06 12:24:26

The key (or keys) to understanding the issues here is (are) that, in Git:

  • Commits are the history.
  • The "true name" of any commit is its hash ID.
  • No commit can ever be changed.
  • Each commit remembers its previous (immediate ancestor, aka parent) commit(s) by hash ID.
  • Names, including branch and tag names, mainly just store one (1) hash ID.
  • The special property of a branch name is that it changes which hash ID it stores, as the branch grows, normally in a "nice" manner so that whatever commit the branch names today, that commit (by hash ID) eventually leads back to the commit (by hash ID) that the name identified yesterday.

When you "rewrite history", you do not—you can not—change any existing commit. Instead, you copy every existing commit. What git filter-branch does is to copy all the commits you request, in "oldest" (most ancestral) to "newest" (least ancestral / tip-most) order, applying filters as it goes:

  • extract the original commit;
  • apply filter(s);
  • make new commit from result, with parent hash ID changes dictated by any previous copy or copies.

In the end, what this means for a really massive rewrite is that you have, in essence, two different repositories placed side-by-side: the old one, with its old commits, and the new one, with its new commits. At the end of the filtering process, git filter-branch changes the names to point to the new copies.

If you had a tiny repository with just three commits—let's call them commits A through C—and one master branch, and all three commits needed some change(s), you would have this:

A--B--C   [was the original master]

A'-B'-C'  <-- master

The new commits are, literally, new commits. Anyone still using the old commits is literally still using the old commits. They must stop using those commits and start, instead, using the new commits.

In some cases, the filter(s) you specify with git filter-branch wind up not changing anything at all in an original commit. In this case—if the new commit that filter-branch writes is bit-for-bit identical to the original commit—then, and only then, the new commit is actually the same as the old commit. If we look at this same three-commit original repository, but choose a filter that modifies the content or metadata of only the second B commit, we get instead:

A--B--C
 \
  B'-C'  <-- master

as the final result.

Note that this occurs even though nothing about original C was changed by the filtering. This is because something about original B was changed, resulting in new-and-different commit B'. Hence, when git filter-branch copied C, it had to make one change: the parent of the copy C' is the new B' rather than the original B.

That is, git filter-branch copied A to a new commit, but made no change at all (not even to any parent information), so the new commit turned out to be a re-use of original A. Then it copied B to a new commit, and made a change, so the new commit is now B'. Then it copied C without making changes, changed the parent to B', and wrote new commit C'.

If your filter made a change only to C, the git filter-branch command would copy A to itself, B to itself, and C to C', giving:

A--B--C
    \
     C'  <-- master

Dealing with an upstream rewrite

In general, the easiest way for people to deal with a really massive upstream origin rewrite is for them to discard their existing repositories entirely. That is, we'd expect to share no more than a few original commits: at some early point in the massive rewrite, we change commit A or one near it, so that every subsequent commit has to be copied to a new commit. Thus, creating a new clone is probably not much if any more expensive than updating an existing one. It's certainly easier!

This is not, strictly speaking, necessary. As a "downstream" consumer, we can run git fetch and obtain all the new commits with their updated branch names, and perhaps updated tags (be especially careful here as tags won't update by default). But since we have our own branch names, pointing to the original commits and not the newly-copied commits, we must now make each of our branch names refer to the newly-copied commits, perhaps also copying any commits that we have that the upstream did not have (and hence did not already copy).

In other words, we could, for each of our branches, run:

git checkout <branch>
git reset --hard origin/<branch>

to make our branch name, as its tip commit, the same commit that origin/branch names. (Remember, git fetch force-updates all of our origin/branch names to match the hash ID to which branch points on origin.)

This is equivalent to deleting each of our branches and using git checkout to re-create them. In other words, it won't carry forward any of our commits that whoever rewrote origin did not copy (because they couldn't because they didn't have them). To carry forward our commits, we must do the same thing we would to deal with an upstream rebase. Whether the built-in fork-point code will do that correctly for you—it often will if your Git is at least 2.0—is really for a separate question (and has been answered elsewhere already). Note that you will have to do this for each branch in which you have commits you wish to carry forward.

On the second machine, first run git fetch, not git pull. Then for each branch whose history was rewritten, you need to do git reset --hard HEAD. Note that this command only works with the current branch. So if more than one branch was affected by the history rewrite, you need to checkout and reset each one.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!