What exactly does git's “rebase --preserve-merges” do (and why?)

前端 未结 2 1835
滥情空心
滥情空心 2020-11-22 04:37

Git\'s documentation for the rebase command is quite brief:

--preserve-merges
    Instead of ignoring merges, try to recreate them.

This uses the --interact         


        
相关标签:
2条回答
  • 2020-11-22 05:15

    As with a normal git rebase, git with --preserve-merges first identifies a list of commits made in one part of the commit graph, and then replays those commits on top of another part. The differences with --preserve-merges concern which commits are selected for replay and how that replaying works for merge commits.

    To be more explicit about the main differences between normal and merge-preserving rebase:

    • Merge-preserving rebase is willing to replay (some) merge commits, whereas normal rebase completely ignores merge commits.
    • Because it's willing to replay merge commits, merge-preserving rebase has to define what it means to replay a merge commit, and deal with some extra wrinkles
      • The most interesting part, conceptually, is perhaps in picking what the new commit's merge parents should be.
      • Replaying merge commits also require explicitly checking out particular commits (git checkout <desired first parent>), whereas normal rebase doesn't have to worry about that.
    • Merge-preserving rebase considers a shallower set of commits for replay:
      • In particular, it will only consider replaying commits made since the most recent merge base(s) -- i.e. the most recent time the two branches diverged --, whereas normal rebase might replay commits going back to the first time the two branches diverged.
      • To be provisional and unclear, I believe this is ultimately a means to screen out replaying "old commits" that have already been "incorporated into" a merge commit.

    First I will try to describe "sufficiently exactly" what rebase --preserve-merges does, and then there will be some examples. One can of course start with the examples, if that seems more useful.

    The Algorithm in "Brief"

    If you want to really get into the weeds, download the git source and explore the file git-rebase--interactive.sh. (Rebase is not part of Git's C core, but rather is written in bash. And, behind the scenes, it shares code with "interactive rebase".)

    But here I will sketch what I think is the essence of it. In order to reduce the number of things to think about, I have taken a few liberties. (e.g. I don't try to capture with 100% accuracy the precise order in which computations take place, and ignore some less central-seeming topics, e.g. what to do about commits that have already been cherry-picked between branches).

    First, note that a non-merge-preserving rebase is rather simple. It's more or less:

    Find all commits on B but not on A ("git log A..B")
    Reset B to A ("git reset --hard A") 
    Replay all those commits onto B one at a time in order.
    

    Rebase --preserve-merges is comparatively complicated. Here's as simple as I've been able to make it without losing things that seem pretty important:

    Find the commits to replay:
      First find the merge-base(s) of A and B (i.e. the most recent common ancestor(s))
        This (these) merge base(s) will serve as a root/boundary for the rebase.
        In particular, we'll take its (their) descendants and replay them on top of new parents
      Now we can define C, the set of commits to replay. In particular, it's those commits:
        1) reachable from B but not A (as in a normal rebase), and ALSO
        2) descendants of the merge base(s)
      If we ignore cherry-picks and other cleverness preserve-merges does, it's more or less:
        git log A..B --not $(git merge-base --all A B)
    Replay the commits:
      Create a branch B_new, on which to replay our commits.
      Switch to B_new (i.e. "git checkout B_new")
      Proceeding parents-before-children (--topo-order), replay each commit c in C on top of B_new:
        If it's a non-merge commit, cherry-pick as usual (i.e. "git cherry-pick c")
        Otherwise it's a merge commit, and we'll construct an "equivalent" merge commit c':
          To create a merge commit, its parents must exist and we must know what they are.
          So first, figure out which parents to use for c', by reference to the parents of c:
            For each parent p_i in parents_of(c):
              If p_i is one of the merge bases mentioned above:
                # p_i is one of the "boundary commits" that we no longer want to use as parents
                For the new commit's ith parent (p_i'), use the HEAD of B_new.
              Else if p_i is one of the commits being rewritten (i.e. if p_i is in R):
                # Note: Because we're moving parents-before-children, a rewritten version
                # of p_i must already exist. So reuse it:
                For the new commit's ith parent (p_i'), use the rewritten version of p_i.
              Otherwise:
                # p_i is one of the commits that's *not* slated for rewrite. So don't rewrite it
                For the new commit's ith parent (p_i'), use p_i, i.e. the old commit's ith parent.
          Second, actually create the new commit c':
            Go to p_1'. (i.e. "git checkout p_1'", p_1' being the "first parent" we want for our new commit)
            Merge in the other parent(s):
              For a typical two-parent merge, it's just "git merge p_2'".
              For an octopus merge, it's "git merge p_2' p_3' p_4' ...".
            Switch (i.e. "git reset") B_new to the current commit (i.e. HEAD), if it's not already there
      Change the label B to apply to this new branch, rather than the old one. (i.e. "git reset --hard B")
    

    Rebase with an --onto C argument should be very similar. Just instead of starting commit playback at the HEAD of B, you start commit playback at the HEAD of C instead. (And use C_new instead of B_new.)

    Example 1

    For example, take commit graph

      B---C <-- master
     /                     
    A-------D------E----m----H <-- topic
             \         /
              F-------G
    

    m is a merge commit with parents E and G.

    Suppose we rebased topic (H) on top of master (C) using a normal, non-merge-preserving rebase. (For example, checkout topic; rebase master.) In that case, git would select the following commits for replay:

    • pick D
    • pick E
    • pick F
    • pick G
    • pick H

    and then update the commit graph like so:

      B---C <-- master
     /     \                
    A       D'---E'---F'---G'---H' <-- topic
    

    (D' is the replayed equivalent of D, etc..)

    Note that merge commit m is not selected for replay.

    If we instead did a --preserve-merges rebase of H on top of C. (For example, checkout topic; rebase --preserve-merges master.) In this new case, git would select the following commits for replay:

    • pick D
    • pick E
    • pick F (onto D' in the 'subtopic' branch)
    • pick G (onto F' in the 'subtopic' branch)
    • pick Merge branch 'subtopic' into topic
    • pick H

    Now m was chosen for replay. Also note that merge parents E and G were picked for inclusion before merge commit m.

    Here is the resulting commit graph:

     B---C <-- master
    /     \                
    A      D'-----E'----m'----H' <-- topic
            \          / 
             F'-------G'
    

    Again, D' is a cherry-picked (i.e. recreated) version of D. Same for E', etc.. Every commit not on master has been replayed. Both E and G (the merge parents of m) have been recreated as E' and G' to serve as the parents of m' (after rebase, the tree history still remains the same).

    Example 2

    Unlike with normal rebase, merge-preserving rebase can create multiple children of the upstream head.

    For example, consider:

      B---C <-- master
     /                     
    A-------D------E---m----H <-- topic
     \                 |
      ------- F-----G--/ 
    

    If we rebase H (topic) on top of C (master), then the commits chosen for rebase are:

    • pick D
    • pick E
    • pick F
    • pick G
    • pick m
    • pick H

    And the result is like so:

      B---C  <-- master
     /    | \                
    A     |  D'----E'---m'----H' <-- topic
           \            |
             F'----G'---/
    

    Example 3

    In the above examples, both the merge commit and its two parents are replayed commits, rather than the original parents that the original merge commit have. However, in other rebases a replayed merge commit can end up with parents that were already in the commit graph before the merge.

    For example, consider:

      B--C---D <-- master
     /    \                
    A---E--m------F <-- topic
    

    If we rebase topic onto master (preserving merges), then the commits to replay will be

    • pick merge commit m
    • pick F

    The rewritten commit graph will look like so:

                         B--C--D <-- master
                        /       \             
                       A-----E---m'--F'; <-- topic
    

    Here replayed merge commit m' gets parents that pre-existed in the commit graph, namely D (the HEAD of master) and E (one of the parents of the original merge commit m).

    Example 4

    Merge-preserving rebase can get confused in certain "empty commit" cases. At least this is true only some older versions of git (e.g. 1.7.8.)

    Take this commit graph:

                       A--------B-----C-----m2---D <-- master
                        \        \         /
                          E--- F--\--G----/
                                \  \
                                 ---m1--H <--topic
    

    Note that both commit m1 and m2 should have incorporated all the changes from B and F.

    If we try to do git rebase --preserve-merges of H (topic) onto D (master), then the following commits are chosen for replay:

    • pick m1
    • pick H

    Note that the changes (B, F) united in m1 should already be incorporated into D. (Those changes should already be incorporated into m2, because m2 merges together the children of B and F.) Therefore, conceptually, replaying m1 on top of D should probably either be a no-op or create an empty commit (i.e. one where the diff between successive revisions is empty).

    Instead, however, git may reject the attempt to replay m1 on top of D. You can get an error like so:

    error: Commit 90caf85 is a merge but no -m option was given.
    fatal: cherry-pick failed
    

    It looks like one forgot to pass a flag to git, but the underlying problem is that git dislikes creating empty commits.

    0 讨论(0)
  • 2020-11-22 05:28

    Git 2.18 (Q2 2018) will improve considerably the --preserve-merge option by adding a new option.

    "git rebase" learned "--rebase-merges" to transplant the whole topology of commit graph elsewhere.

    (Note: Git 2.22, Q2 2019, actually deprecates --preserve-merge, and Git 2.25, Q1 2020, stops advertising it in the "git rebase --help" output)

    See commit 25cff9f, commit 7543f6f, commit 1131ec9, commit 7ccdf65, commit 537e7d6, commit a9be29c, commit 8f6aed7, commit 1644c73, commit d1e8b01, commit 4c68e7d, commit 9055e40, commit cb5206e, commit a01c2a5, commit 2f6b1d1, commit bf5c057 (25 Apr 2018) by Johannes Schindelin (dscho).
    See commit f431d73 (25 Apr 2018) by Stefan Beller (stefanbeller).
    See commit 2429335 (25 Apr 2018) by Phillip Wood (phillipwood).
    (Merged by Junio C Hamano -- gitster -- in commit 2c18e6a, 23 May 2018)

    pull: accept --rebase-merges to recreate the branch topology

    Similar to the preserve mode simply passing the --preserve-merges option to the rebase command, the merges mode simply passes the --rebase-merges option.

    This will allow users to conveniently rebase non-trivial commit topologies when pulling new commits, without flattening them.


    git rebase man page now has a full section dedicated to rebasing history with merges.

    Extract:

    There are legitimate reasons why a developer may want to recreate merge commits: to keep the branch structure (or "commit topology") when working on multiple, inter-related branches.

    In the following example, the developer works on a topic branch that refactors the way buttons are defined, and on another topic branch that uses that refactoring to implement a "Report a bug" button.
    The output of git log --graph --format=%s -5 may look like this:

    *   Merge branch 'report-a-bug'
    |\
    | * Add the feedback button
    * | Merge branch 'refactor-button'
    |\ \
    | |/
    | * Use the Button class for all buttons
    | * Extract a generic Button class from the DownloadButton one
    

    The developer might want to rebase those commits to a newer master while keeping the branch topology, for example when the first topic branch is expected to be integrated into master much earlier than the second one, say, to resolve merge conflicts with changes to the DownloadButton class that made it into master.

    This rebase can be performed using the --rebase-merges option.


    See commit 1644c73 for a small example:

    rebase-helper --make-script: introduce a flag to rebase merges

    The sequencer just learned new commands intended to recreate branch structure (similar in spirit to --preserve-merges, but with a substantially less-broken design).

    Let's allow the rebase--helper to generate todo lists making use of these commands, triggered by the new --rebase-merges option.
    For a commit topology like this (where the HEAD points to C):

    - A - B - C (HEAD)
        \   /
          D
    

    the generated todo list would look like this:

    # branch D
    pick 0123 A
    label branch-point
    pick 1234 D
    label D
    
    reset branch-point
    pick 2345 B
    merge -C 3456 D # C
    

    What is the difference with --preserve-merge?
    Commit 8f6aed7 explains:

    Once upon a time, this here developer thought: wouldn't it be nice if, say, Git for Windows' patches on top of core Git could be represented as a thicket of branches, and be rebased on top of core Git in order to maintain a cherry-pick'able set of patch series?

    The original attempt to answer this was: git rebase --preserve-merges.

    However, that experiment was never intended as an interactive option, and it only piggy-backed on git rebase --interactive because that command's implementation looked already very, very familiar: it was designed by the same person who designed --preserve-merges: yours truly.

    And by "yours truly", the author refers to himself: Johannes Schindelin (dscho), who is the main reason (with a few other heroes -- Hannes, Steffen, Sebastian, ...) that we have Git For Windows (even though back in the day -- 2009 -- that was not easy).
    He is working at Microsoft since Sept. 2015, which makes sense considering Microsoft now heavily uses Git and needs his services.
    That trend started in 2013 actually, with TFS. Since then, Microsoft manages the largest Git repository on the planet! And, since Oct. 2018, Microsoft acquired GitHub.

    You can see Johannes speak in this video for Git Merge 2018 in April 2018.

    Some time later, some other developer (I am looking at you, Andreas! ;-)) decided that it would be a good idea to allow --preserve-merges to be combined with --interactive (with caveats!) and the Git maintainer (well, the interim Git maintainer during Junio's absence, that is) agreed, and that is when the glamor of the --preserve-merges design started to fall apart rather quickly and unglamorously.

    Here Jonathan is talking about Andreas Schwab from Suse.
    You can see some of their discussions back in 2012.

    The reason? In --preserve-merges mode, the parents of a merge commit (or for that matter, of any commit) were not stated explicitly, but were implied by the commit name passed to the pick command.

    This made it impossible, for example, to reorder commits.
    Not to mention to move commits between branches or, deity forbid, to split topic branches into two.

    Alas, these shortcomings also prevented that mode (whose original purpose was to serve Git for Windows' needs, with the additional hope that it may be useful to others, too) from serving Git for Windows' needs.

    Five years later, when it became really untenable to have one unwieldy, big hodge-podge patch series of partly related, partly unrelated patches in Git for Windows that was rebased onto core Git's tags from time to time (earning the undeserved wrath of the developer of the ill-fated git-remote-hg series that first obsoleted Git for Windows' competing approach, only to be abandoned without maintainer later) was really untenable, the "Git garden shears" were born: a script, piggy-backing on top of the interactive rebase, that would first determine the branch topology of the patches to be rebased, create a pseudo todo list for further editing, transform the result into a real todo list (making heavy use of the exec command to "implement" the missing todo list commands) and finally recreate the patch series on top of the new base commit.

    (The Git garden shears script is referenced in this patch in commit 9055e40)

    That was in 2013.
    And it took about three weeks to come up with the design and implement it as an out-of-tree script. Needless to say, the implementation needed quite a few years to stabilize, all the while the design itself proved itself sound.

    With this patch, the goodness of the Git garden shears comes to git rebase -i itself.
    Passing the --rebase-merges option will generate a todo list that can be understood readily, and where it is obvious how to reorder commits.
    New branches can be introduced by inserting label commands and calling merge <label>.
    And once this mode will have become stable and universally accepted, we can deprecate the design mistake that was --preserve-merges.


    Git 2.19 (Q3 2018) improves the new --rebase-merges option by making it work with --exec.

    The "--exec" option to "git rebase --rebase-merges" placed the exec commands at wrong places, which has been corrected.

    See commit 1ace63b (09 Aug 2018), and commit f0880f7 (06 Aug 2018) by Johannes Schindelin (dscho).
    (Merged by Junio C Hamano -- gitster -- in commit 750eb11, 20 Aug 2018)

    rebase --exec: make it work with --rebase-merges

    The idea of --exec is to append an exec call after each pick.

    Since the introduction of fixup!/squash! commits, this idea was extended to apply to "pick, possibly followed by a fixup/squash chain", i.e. an exec would not be inserted between a pick and any of its corresponding fixup or squash lines.

    The current implementation uses a dirty trick to achieve that: it assumes that there are only pick/fixup/squash commands, and then inserts the exec lines before any pick but the first, and appends a final one.

    With the todo lists generated by git rebase --rebase-merges, this simple implementation shows its problems: it produces the exact wrong thing when there are label, reset and merge commands.

    Let's change the implementation to do exactly what we want: look for pick lines, skip any fixup/squash chains, and then insert the exec line. Lather, rinse, repeat.

    Note: we take pains to insert before comment lines whenever possible, as empty commits are represented by commented-out pick lines (and we want to insert a preceding pick's exec line before such a line, not afterward).

    While at it, also add exec lines after merge commands, because they are similar in spirit to pick commands: they add new commits.


    Git 2.22 (Q2 2019) fixes the usage of the refs/rewritten/ hierarchy to store a rebase intermediate states, which inherently makes the hierarchy per worktree.

    See commit b9317d5, commit 90d31ff, commit 09e6564 (07 Mar 2019) by Nguyễn Thái Ngọc Duy (pclouds).
    (Merged by Junio C Hamano -- gitster -- in commit 917f2cd, 09 Apr 2019)

    Make sure refs/rewritten/ is per-worktree

    a9be29c (sequencer: make refs generated by the label command worktree-local, 2018-04-25, Git 2.19) adds refs/rewritten/ as per-worktree reference space.
    Unfortunately (my bad) there are a couple places that need update to make sure it's really per-worktree.

    - add_per_worktree_entries_to_dir() is updated to make sure ref listing look at per-worktree refs/rewritten/ instead of per-repo one.

    • common_list[] is updated so that git_path() returns the correct location. This includes "rev-parse --git-path".

    This mess is created by me.
    I started trying to fix it with the introduction of refs/worktree, where all refs will be per-worktree without special treatments.
    Unfortunate refs/rewritten came before refs/worktree so this is all we can do.


    With Git 2.24 (Q4 2019), "git rebase --rebase-merges" learned to drive different merge strategies and pass strategy specific options to them.

    See commit 476998d (04 Sep 2019) by Elijah Newren (newren).
    See commit e1fac53, commit a63f990, commit 5dcdd74, commit e145d99, commit 4e6023b, commit f67336d, commit a9c7107, commit b8c6f24, commit d51b771, commit c248d32, commit 8c1e240, commit 5efed0e, commit 68b54f6, commit 2e7bbac, commit 6180b20, commit d5b581f (31 Jul 2019) by Johannes Schindelin (dscho).
    (Merged by Junio C Hamano -- gitster -- in commit 917a319, 18 Sep 2019)


    With Git 2.25 (Q1 2020), the logic used to tell worktree local and repository global refs apart is fixed, to facilitate the preserve-merge.

    See commit f45f88b, commit c72fc40, commit 8a64881, commit 7cb8c92, commit e536b1f (21 Oct 2019) by SZEDER Gábor (szeder).
    (Merged by Junio C Hamano -- gitster -- in commit db806d7, 10 Nov 2019)

    path.c: don't call the match function without value in trie_find()

    Signed-off-by: SZEDER Gábor

    'logs/refs' is not a working tree-specific path, but since commit b9317d55a3 (Make sure refs/rewritten/ is per-worktree, 2019-03-07, v2.22.0-rc0) 'git rev-parse --git-path' has been returning a bogus path if a trailing '/' is present:

    $ git -C WT/ rev-parse --git-path logs/refs --git-path logs/refs/
    /home/szeder/src/git/.git/logs/refs
    /home/szeder/src/git/.git/worktrees/WT/logs/refs/
    

    We use a trie data structure to efficiently decide whether a path belongs to the common dir or is working tree-specific.

    As it happens b9317d55a3 triggered a bug that is as old as the trie implementation itself, added in 4e09cf2acf ("path: optimize common dir checking", 2015-08-31, Git v2.7.0-rc0 -- merge listed in batch #2).

    • According to the comment describing trie_find(), it should only call the given match function 'fn' for a "/-or-\0-terminated prefix of the key for which the trie contains a value".
      This is not true: there are three places where trie_find() calls the match function, but one of them is missing the check for value's existence.

    • b9317d55a3 added two new keys to the trie:

      • 'logs/refs/rewritten', and
      • 'logs/refs/worktree', next to the already existing 'logs/refs/bisect'.
        This resulted in a trie node with the path 'logs/refs/', which didn't exist before, and which doesn't have a value attached.
        A query for 'logs/refs/' finds this node and then hits that one callsite of the match function which doesn't check for the value's existence, and thus invokes the match function with NULL as value.
    • When the match function check_common() is invoked with a NULL value, it returns 0, which indicates that the queried path doesn't belong to the common directory, ultimately resulting the bogus path shown above.

    Add the missing condition to trie_find() so it will never invoke the match function with a non-existing value.

    check_common() will then no longer have to check that it got a non-NULL value, so remove that condition.

    I believe that there are no other paths that could cause similar bogus output.

    AFAICT the only other key resulting in the match function being called with a NULL value is 'co' (because of the keys 'common' and 'config').

    However, as they are not in a directory that belongs to the common directory the resulting working tree-specific path is expected.

    0 讨论(0)
提交回复
热议问题