If I am in such a git situation:
* da6a750 (A) Further in A, okay for merging back into master
* bf27b58 Merge branch \'master\' into A
|\\
| * 86294d1 (
Your question comes down to: "why isn't Git obeying my custom merge direction?" In fact, this problem can occur with any merge, and any custom merge driver. The fact that this merge can be done as a fast forward operation merely guarantees that you (with your particular case) will hit the problem.
The reason boils down to the fact that any custom .gitattributes
merge driver, including merge=ours
, is invoked only when Git believes there is "something to merge". This does not seem so bad until you realize what it takes for Git to have such a belief.
It's worth mentioning here, as a side-bar, Git's -s strategy
argument to git merge
. These strategies take over the whole process, including the "find the merge base" step—plus everything after that—and hence can do their own thing, which includes ignoring .gitattributes
entirely. Obviously if a strategy ignores your .gitattributes
, setting a custom merge driver or mode there won't help.
Therefore, we're looking only at the -s
strategies that do use a merge base and two of what Git calls heads (which we'll label "ours" and "theirs"), and do use .gitattributes
. There are three of those built in to Git—recursive
, resolve
, and subtree
—but they all work the same here, with respect to what gets merged and what happens with custom merge drivers. (The other two built-in merge strategies, ours
and octopus
, either don't bother with a merge base and a "theirs" at all, or—for octopus
—have more than two heads, so that there is no clear notion of "ours" and "theirs".)
So, now that we have settled on the built in merges that have one merge base commit and two head commits, we can look at what it means for Git to think, in its tiny little pre-programmed Gitty way, that there is something to merge.
The two heads are easier to define. One of them, the one we call "ours", is just HEAD
itself. The other is whatever argument we pass to git merge
:
git merge A
means "ours" is HEAD
and "theirs" is the commit identified by A
.
Here is your git log --all --decorate --oneline --graph
output again (thanks, by the way, for including that—it's critical for most merges!):
* da6a750 (A) Further in A, okay for merging back into master
* bf27b58 Merge branch 'master' into A
|\
| * 86294d1 (HEAD -> master) Development on master
* | abe6b8a Welcome to branch A
|/
* 589517c First commit
so we can say that the two heads are commit 86294d1
(HEAD
or master
or just "ours") and commit da6a750
(A
or just "theirs").
The merge base is whatever commit they first share in terms of their graph history, i.e., starting from both heads, work backwards in history if needed until you find a commit that they have in common, that you can reach from both heads. So we start from da6a750
, work backwards one step to bf27b58
, then work backwards one more step to both 86294d1
and abe6b8a
. Meanwhile, we start from 86294d1
and ... oh look we've hit a common commit already! :-)
Since the merge base is one of the two heads, normally we'd either get a fast forward, or a complaint that there is nothing to merge. Since the merge base is the "ours" head, of those two options, Git would pick the fast forward operation. Using --no-ff
tells Git: don't pick that, go ahead and do a full blown merge after all.
Now, the fact that the merge base is the "ours" commit guarantees we will have your problem, but in fact, we could have your problem even if the merge base were not the "ours" commit. Let's take a look at what's inside a commit, at the next level down of what Git needs and does when it works on both git diff
and git merge
—but first, let's think about what git merge
is supposed to do.
As a general rule, the idea when running git merge
is that we want to take two sets of work—things we did on our branch in our commits, and things "they", whoever they are, did on their branch in their commits—and produce a new commit that is the best of both worlds: that takes any good stuff we did, plus any good stuff they did.
If we draw the graph horizontally instead of vertically, with older commits at the left and newer ones at the right, we can draw this:
o--o--o--...--H <-- ours
/
...--o--B
\
o-----...-----T <-- theirs
where each o
is a commit, and so are B
, H
, and T
. Commit B
is the merge base, where the two forks in this graph rejoin in the "past" (leftward) direction. H
is our (HEAD) commit and T
is the head / tip commit of their branch. How, then, can we combine our work with their work?
Git's answer is to run two git diff
s:
git diff B H # find out what we did
git diff B T # find out what they did
Then it can combine these two diffs:
Wherever we added something—some lines of text—to some files, Git should make the final result have those added lines in those files. Wherever we deleted some lines of text in some files, it should make the final result have those lines deleted.
Because git diff
expresses the differences as "delete this and add that" (even for differences that change this to that), that covers everything git diff
says.
Likewise, wherever they added lines, Git should make the final result have the added lines. Wherever they deleted lines, Git should make the final result have those same deletions.
To take care of a very common case, if we and they made the exact same change—deleting the same original lines, and/or adding the same replacements—Git takes only one copy of this.
And of course, if there's a place where we both touched the same lines, but in different ways, Git just throws up its metaphorical hands, exclaims "Oy vey!", and declares a merge conflict.
(It's these merge conflicts that give us the most headaches, so most of the twisty knobs Git gives us are designed for dealing with those conflicts in some way. That's mostly true of .gitattributes
merge attributes, too—though that's not directly relevant to our problem here.)
Now, all this combining is a lot of work, so to make Git go fast, there's a short-cut.
git merge
to git diff
We can look at any commit object, or indeed any Git object at all, with git cat-file -p
:
$ git cat-file -p HEAD
tree 5bc304073b94505cd3f6716829c4cec5a7474762
parent 29257c2c82dca881c4cc65765392a32e46264fbe
author Chris Torek 1490287144 -0700
committer Chris Torek 1490297185 -0700
insert early footnote on Git branch creation
In the "about version control" chapter section that introduces
(I snipped the rest off here).
The more interesting part here is actually the tree
, so let's view some of that:
$ git cat-file -p 5bc304073b94505cd3f6716829c4cec5a7474762
100644 blob 8d1519c435c4da5a65228785fa7ba7033fe011ff .gitignore
100644 blob 66c9d22a735ee9d8da7f7ed49599583aa642842f Makefile
100644 blob c9c824fa6668e45976c4fe8a10e4d5c25e272f0c about.tex
100644 blob 1757109f5aa921ecf9a8051180c25f09e1496c07 aboutvc.tex
(again I snipped things off here).
Each of those raw hash IDs for each blob
object—i.e., stored file version—tells Git which version goes with this commit. (More precisely, that's the file version for this tree
object, but this tree
goes with this commit, so it amounts to the same thing.)
Git can, and in fact has to, extract these blob hash IDs for each of the three commits—the merge base, "ours", and "theirs". The hash IDs are how it will be able to diff the old and new versions of files like aboutvc.tex
(in my case) or specific
(in yours). But there is an interesting thing about these hash IDs: they're based entirely on the contents of the object.1 If two files in two different commits are exactly, completely, 100% bit-for-bit identical, they have the same hash and are stored in the repository just once. This means that no matter how many commits have a copy of that particular version of that file, there's only one copy stored in the database.
1In fact, they are cryptographic hashes of the object contents, including the little type-and-size header Git sticks on the front of each object. That header is why the now-famous SHA-1 hash collision is not an immediate problem for Git.
This fast hash comparison—the fact that the same hash means "same version of that file"—means that git diff
and git merge
can immediately and easily tell that there's no change to some file, from base to ours, or base to theirs ... and this is precisely where merge=ours
goes wrong. Git looks at base-vs-ours, and base-vs-theirs. One pair has the same hash. One pair has a different hash.
At this point, Git simply assumes that the right answer, regardless of merge strategy or turney-knob setting in .gitattributes
, is to take the file from whichever head has a different hash. For most files, in most cases, that's the right answer. But if we have defined a custom merge driver, or set merge=ours
, it might be the wrong answer.
When the one head that's different is "theirs", and the custom merge direction is "keep ours", it's the wrong answer. That's true no matter what commit is chosen as the merge base, but when the merge base is HEAD
—is our commit—then all the hashes, in the diff from base to ours, are the same, and the result is always "their version of the file".
That, in fact, is why a fast forward is possible in the first place: the final merged tree is always just their tree. Git, in effect, ignores all the custom directions in .gitattributes
. That remains true even if you force a real merge rather than a fast-forward-non-merge "merge".
Perhaps Git should check for custom merge drivers or merge=ours
directives, and disable this short-cut, at least for real (non-fast-forward) merges. But it doesn't, and therefore you will have this problem. You will also have this problem for other cases, where there's a real merge to be done, but the file is modified only in the base-to-theirs comparison.
People often want to use this merge=ours
to make sure that configuration files stored on a branch are kept the way they are on that branch. This is nearly always the wrong overall strategy: instead, configuration files should be omitted entirely from version control, or at least from the version control of this particular repository. Instead of committing, e.g., config.ini
or config.php
, commit a config.ini.sample
or config.default.php
or some such. Copy this configuration to the "real config", or read it as a secondary strategy if the "real" configuration is missing or incomplete.
This gives you a way to version configurations (sample and/or default ones) in general, without versioning the specific run-time configuration of someone using this repository as the place from which they run the software / app itself. Should the user wish to version-control her particular configuration, she can store that in a separate repository, and replace config.ini
with (e.g.) a symbolic link to ../myconfigs/fooapp.ini
, which is where she has her configurations versioned.
(A similar trick is to get the configuration from $HOME/.gitconfig
or /usr/local/etc/fooapp.ini
. That is, store the configuration separately in the first place. Again, if you want or need some sort of default configuration, you can keep that versioned with the software, but the user's own configuration is separate, and not under your own version control at all.)