Does the order of Git merging matter?

后端 未结 5 696
不知归路
不知归路 2020-12-29 05:33

Suppose I have two branches, A and B. Do the following properties hold?

  • Merging A into B conflicts if and
相关标签:
5条回答
  • 2020-12-29 06:21

    Merge conflicts happen when two people change the same lines in the same file, or if one person decided to delete it while the other person decided to modify it.

    So basically,

    • If there are conflicts when you try to merge B to A, there will be conflicts when you try to merge A to B.

    • The result of merging A into B and B into A has to be the same(if there are no conflicts).

    The question is purely logical. So if anyone thinks my logical answer is wrong or needs improvement, feel free to correct me or edit this.

    0 讨论(0)
  • 2020-12-29 06:27

    I would go as far as to say that if your two properties are not met, then you have found a bug in git merge.

    Rationale: Merging concurrently in all directions is the very purpose for which git has been built. That is why git has been using 3-way merges from the off: It's the only way to provide correct merge results. This 3-way merge is symmetric from a mathematic point of view, it basically computes a state R = (A - B) + (C - B) + B based on a base commit B from the diverged states A and C. The only difference that comes from merging order should be the order of the parents of the merge commit.


    Edit: If you are interested in more details, torek's answer is what you are looking for. It gives you all the technicalities about the different merge strategies, and points out where my answer is imprecise due to being written at a very high abstraction level.

    0 讨论(0)
  • 2020-12-29 06:30

    Consider A to be the master branch and B to be the sub branch.

    1. Now create a readme.txt file and add some content i.e, "change 1" stage it and commit it.
    2. Now create another sub branch B, make some changes to readme.txt i,e append "change 2" to readme.txt and commit it.
    3. Switch back to Master branch A, now you'll not see any changes made by sub branch B. To reflect the same changes made by B merge from the master branch i.e, from B into A
    4. Once your are there in master branch A, append "change 3" to readme.txt file and commit it.
    5. Now checkout to sub branch B, append "change 4" to readme.txt and commit the changes that you have made
    6. Once when you were there in sub branch B, merging the master branch A into sub branch B causes Merge Conflicts

    Since you'll not see the text "change 3" in readme.txt file from sub branch B, and you are not appending the text "change 4" to readme.txt file. Instead you are overwriting the readme.txt i.e, by merging the contents of the text readme.txt having "change 3" to the text "change 4"

    From the above example both the properties holds good.

    0 讨论(0)
  • 2020-12-29 06:33

    I do not know by heart the algorythms but I believe it is very much "yes" for both questions. If you have found a counterexample it would be very nice to see it. So far I am not aware of any.

    I'd check some ambigous cases, for example if a section of file was duplicated next to it by one person and modified by another. Which of the copies are to change? As there is no single correct answer, it could depend on minor reasons like order of parents.

    0 讨论(0)
  • 2020-12-29 06:37

    cmaster's answer is correct, with caveats. Let's start by noting these items / assumptions:

    • There is always a single merge base commit. Let's call this commit B, for base.
    • The other two inputs are also single commits. Let's call them L for left / local (--ours) and R for right / remote (--theirs).

    The first assumption is not necessarily true. If there are multiple merge base candidates, it is up to the merge strategy to do something about this. The two standard two-head merge strategies are recursive and resolve. The resolve strategy simply picks one at (apparent) random. The recursive strategy merges the merge bases two at a time, and then uses the resulting commit as the merge base. The one chosen by resolve can be affected by the order of arguments to git merge-base and hence to git merge, so that's one caveat right there. Because the recursive strategy can do more than one merge, there's a second caveat here that is difficult to describe yet, but it applies only if there are more than two merge bases.

    The second assumption is much more true, but note that the merge code can run on a partially-modified work-tree. In this case all bets are off, since the work-tree does not match either L or R. A standard git merge will tell you that you must commit first, though, so normally this is not a problem.

    Merge strategies matter

    We already noted the issue with multiple merge bases. We're assuming a two-head merge as well.

    Octopus merges can deal with multiple heads. This also change the merge base computation, but in general octopus merge won't work with cases that have complicated merge issues and will just refuse to run where the order might matter. I would not push hard on it though; this is another case where the symmetry rule is likely to fail.

    The -s ours merge strategy completely ignores all other commits so merge order is obviously crucial here: the result is always L. (I am fairly sure that -s ours does not even bother computing a merge base B.)

    You can write your own strategy and do whatever you want. Here, you can make the order matter, as it does with -s ours.

    High level merging (with one merge base): file name changes

    Git now computes, in effect, two change-sets from these three snapshots:

    • L - B, or git diff --find-renames B L
    • R - B, or git diff --find-renames B R

    The rename detectors here are independent—by this I mean neither affects the other; both use the same rules though. The main issue here is that it's possible for the same file in B to be detected as renamed in both change-sets, in which case we get what I call a high level conflict, specifically a rename/rename conflict. (We can also get high level conflicts with rename/delete and several other cases.) For a rename/rename conflict, the final name that Git chooses is the name in L, not the name in R. So here, the order matters in terms of final file name. This does not affect the work-tree merged content.

    Low level merging

    At this point we should take a small tour of Git's internals. We have now paired up files in B-vs-L and in B-vs-R, i.e., we know which files are "the same" files in each of the three commits. However, the way Git stores files and commits is interesting. From a logical point of view, Git has no deltas: each commit is a complete snapshot of all files. Each file, however, is just a pair of entities: a path name P and a hash ID H.

    In other words, at this point, there is no need to walk through all the commits leading from B to either L or R. We know that we have some file F, identified by up to three separate path names (and as noted above, Git will use the L path in most cases, but use the R path if there is only one rename in the B-vs-R side of the merge). The complete contents of all three files are available by direct lookup: HB represents the base file content, HL represents the left-side file, and HR represents the right-side file.

    Two files match exactly if and only if their hashes match.1 So at this point Git just compares the hash IDs. If all three match, the merged file is the same as the left and right and base files: there is no work. If L and R match, the merged file is the L or R content; the base is irrelevant as both sides made the same change. If B matches either L or R but not the other, the merged file is the non-matching hash. Git only has to do the low-level merge if there is a potential for a low-level merge conflict.

    So now, Git extracts the three contents and does the merge. This works on a line-by-line basis (with lines grouped together when multiple adjacent lines are changed):

    • If both left and right sides touched only different source lines, Git will take both changes. This is clearly symmetric.

    • If left and right touched the same source lines, Git will check whether the change itself is also the same. If so, Git will take one copy of the change. This, too, is clearly symmetric.

    • If left and right touched the same lines, but made different changes, Git will declare a merge conflict. The work-tree content will depend on the order of the changes, since the work-tree content has <<<<<<< HEAD ... ||||||| base ... ======= ... other >>>>>>> markers (the base section is optional, appearing if you choose diff3 style).

    The definition of the same lines is a little tricky. This does depend on the diff algorithm (which you may select), since some sections of some files may repeat. However, Git always uses a single algorithm for computing both L and R, so the order does not matter here.


    1To put this another way, if you manage to produce a Doppelgänger file—one that has different content from, but the same hash as, some existing file, Git simply refuses to put that file into the repository. The shattered.it PDF is not such a file, because Git prefixes the file's data with the word blob and the size of the file, but the principle applies. Note that putting such a file into SVN breaks SVN—well, sort of.


    -X options are obviously asymmetric

    You can override merge conflict complaints using -X ours or -X theirs. These direct Git to resolve conflicts in favor of the L or R change respectively.

    Merging makes a merge commit, which affects merge base computation

    This symmetry principle, even with the above caveats, is fine for a single merge. But once you have made a merge, the next merge you run will use the modified commit graph to compute the new merge base. If you have two merges that you intend to do, and you do them as:

    git merge one    (and fix conflicts and commit if needed)
    git merge two    (fix conflicts and commit if needed)
    

    then even if everything is symmetric in each merge, that does not mean that you will necessarily get the same result as if you run:

    git merge two
    git merge one
    

    Whichever merge runs first, you get a merge commit, and the second merge now finds a different merge base.

    This is particularly important if you do have conflicts that you must fix before finishing whichever merge goes first, since that also affects the L input to the second git merge command. It will use the first merge's snapshot as L, and the new (maybe different) merge base as B, for two of its three inputs.

    This is why I mentioned that -s recursive has potential order differences when working with multiple merge bases. Suppose there are three merge bases. Git will merge the first two (in whatever order they pop out of the merge base computation), commit the result (even if there are merge conflicts—it just commits the conflicts in this case), and then merge that commit with the third commit and commit the result. The final commit here is then the input B. Only if all parts of this process are symmetric will that final B result be order-insensitive. Most merges are symmetric, but we saw all the caveats above.

    0 讨论(0)
提交回复
热议问题