I\'ve worked myself into a situation that is not making sense to me. I\'ll try to describe it as best I can.
I have a development branch and I\'ve merged master into it
This answer is long, because there is a lot going on here. The TL;DR summary, though, is probably that you want --full-history
.
There are multiple separate issues here that need to be untangled:
git log -p
or git show
, often leads people down the wrong path in interpreting what Git stores.git log
command sometimes lies to you (especially around merges), purely in the interest of not overwhelming you with useless information.git merge
does can be a bit tricky. It's straightforward in principle, but most people don't get it right away.Let's look first at Git's most common, ordinary commits. A commit, in Git, is a snapshot (of file-contents by file names). You make one commit, then you change a few things and git add
a changed file or two and make a second commit, and the second commit has all the same files as the first commit, except for the ones overwritten by the git add
.
It's worth drawing these as parts of Git's commit graph. Note that each commit has its own unique hash ID (one of those impossible-to-remember strings like 93aefc
or badf00d
or cafedad
), plus the ID of a parent (or previous) commit. The parent commit hash lets Git string these things together, in a backwards fashion:
... <-E <-F <-G ...
where each uppercase letter stands in for a hash ID, and the arrows cover the idea that each commit "points back" to its parent. Normally we don't need to draw in the internal arrows (they're not very interesting in the end) so I draw these as:
...--E--F--G <-- master
The name master
, however, still deserves an arrow, because the commit to which this arrow points will change over time.
If we pick a commit like G
and view it without using git log -p
or git show
, we will see every file in full, exactly as it is stored in the commit. In fact, that's what happens when we use git checkout
to check it out: we extract all the files in full, into the work-tree, so that we can see and work on them. But when we view it with git log -p
or git show
, Git doesn't show us everything; it only shows us what changed.
To do this, Git extracts both the commit and its parent commit, and then runs a big git diff
on the pair. Whatever is different between the parent F
and the child G
, that's what changed, so that's what git log -p
or git show
shows you.
This is all well and good for ordinary, single-parent commits, but it doesn't work for merge commits. A merge commit is simply any commit with two (or more, but we won't worry about this case) parent commits. You get these by doing a successful git merge
, and we might draw that like this. We start with the two branches (which fork off from some starting-point):
H--I <-- development (HEAD)
/
...--E--F--G <-- master
and then we run git merge master
.1 Git now tries to combine the two branches. If it succeeds, it makes one new commit that has two parents:
H--I--J <-- development (HEAD)
/ /
...--E--F--G <-- master
The name development
now points to the new merge commit J
. The parenthesized (HEAD)
here denotes that this is our current branch. That tells us which name gets moved: we make a new commit—including any new merge commit—and development
is the branch-name that changes to point to the new commit.
If we don't worry about how the contents (the various committed files) of the merge commit are determined, this is all pretty straightforward. The merge commit is like any other commit: it has a complete snapshot, a bunch of files with contents. We check out the merge commit, and those contents get in our work-tree as usual.
But when we go to view the merge commit ... well, Git normally diffs a commit against its parent. The merge has two parents, one for each branch. Which one should Git diff against, to show you changes?
Here, git log
and git show
take different approaches. When you view the commit with git log
, it shows nothing at all by default. It won't choose I
-vs-J
, and it won't choose G
-vs-J
either! It just shows nothing at all, for git log -p
.
1In some Git workflows, merging from master into any other branch is discouraged. It can work, though, and since you did, let's run with it.
The git show
command does something different and better. It runs two git diff
s, one for I
-vs-J
and one for G
-vs-J
. It then tries to combine the two diffs, showing you only what changed in both. That is, where J
is different from I
but not in a particularly interesting way, Git suppresses the difference. Where J
is different from G
but not in a particularly interesting way, Git suppresses this difference as well. This is probably the most useful mode, so it's what git show
shows. It's still quite imperfect, but nothing you can do here is perfect for all purposes.
You can make git log
do this same thing by adding --cc
to the git log -p
options. Or, you can change how either git log
or git show
shows a merge commit by using -m
(note one dash for -m
, two for --cc
, by the way).
The -m
option tells Git that for viewing purposes, it should split the merge. Now Git compares I
to J
, to show you everything you brought in through merging G
. Then Git compares G
to the split-off extra version of J
, to show you everything you brought in through merging I
. The resulting diff is usually very large but (or because) it shows you everything.
There are more ways to try to find what happened to some file, but we need to hold off a moment before getting to your:
git log -- path/to/file1
command. Just as we saw git log
skipping merges, it may skip even more things here (but there are ways to stop Git from doing that).
Let's look at that pre-merge graph again:
H--I <-- development (HEAD)
/
...--E--F--G <-- master
Here, there are two commits on branch development
that are not on branch master
, and two commits on master
that are not on development
. Commit E
(along with all earlier commits) is on both branches. Commit E
is special, though: it's the most recent2 commit that's on both branches. Commit E
is what Git calls the merge base.
To perform a merge, Git effectively runs two git diff
commands:
git diff E I
git diff E G
The first produces a set of changes to various files, which are "what we did on branch development
". It is, in effect, the sum of H
and I
if they are treated as patches. The second produces a—probably different—set of changes to various files, "what we did on master
", and as before it's effectively the sum of F
and G
as patches.
Git then tries to combine these two diffs. Whatever is completely independent between them, it takes both sets of changes, applies them to the contents of commit E
, and uses that as the result. Wherever the two change-sets touch the same line in the same file, Git tries to see if it can just take one copy of that change. If both fixed the spelling of a word on line 33 of file README
, Git can just take one copy of the spelling fix. But wherever the two change-sets touch the same line of the same file, but make a different change, Git declares a "merge conflict", throws its metaphorical hands in the air, and makes you fix up the resulting mess.
If you (or whoever does the merge) wants to, they can stop Git from committing the merge result even if Git thinks it all went swimmingly: git merge --no-commit master
makes Git stop after combining everything. At this point, you can open work-tree files in your editor, change them, write them back, git add
the changed file, and git commit
the merge to put something in the merge that did not come from any of the three inputs (base and two branch-tips).
In any case, the key to understanding all of this is the concept of the merge base commit. If you sit down and draw the commit graph, the merge base is usually pretty obvious unless the graph gets way out of hand (which happens a lot, actually). You can also have Git find the merge base for you—or merge bases, plural, in some cases:
git merge-base --all master development
This prints out a hash ID. In our hypothetical case here, that would be the hash ID of commit E
. Once you have that, you can run git diff
manually, to see what happened to every file. Or you can run an individual-file git diff
:
git diff E development -- path/to/file1
git diff E master -- path/to/file1
Note that if you replace the names master
and development
with the hash IDs of the commits that were current before you did a git merge
, this works even after the merge. That will tell you what Git thought it should combine for path/to/file1
. That, in turn, will tell you whether Git did not see the change, or whether whoever made the merge overrode Git, or handled a conflicting merge incorrectly.
Once you have a merge, a subsequent merge will find a different merge base:
H--I--J----K <-- development
/ /
...--E--F--G--L--M <-- master
We look now at both branch tips and work our way backwards through history (in the leftward direction), following both forks of a merge like J
, to find the first commit we can get to from both branch tips. Starting at K
, we go back to J
, then to both I
and G
. Starting at M
, we go back to L
, then to G
. We find G
to be on both branches, so commit G
is the new merge base. Git will run:
git diff G K
git diff G M
to get the two change-sets to apply to merge-base commit G
.
2"Most recent" here refers to commit graph order, rather than time stamps, although it's probably also the commit with the newest time stamp that is on both branches.
We already saw that git log -p
just skips right over merge commits. You don't see any diff at all, as if the merge were totally magic. But when you run:
git log -- path/to/file1
something else, even more insidious, happens. This is described, albeit rather opaquely, in the (long) git log documentation under the section titled History Simplification.
In our example above, suppose git log
is walking from K
backwards. It finds J
, which is a merge. It then inspects both I
and G
, comparing each to J
after excluding all but the one file you are looking at. That is, it's just comparing path/to/file1
in the three commits.
If one side of the merge doesn't show any change to path/to/file1
, that means the merge result J
was no different from the input (I
or G
). Git calls this "TREESAME". If the merge J
, after being stripped down to this one file, matches I
or G
similarly stripped-down, then J
is TREESAME to I
or G
(or perhaps both). In this case, Git picks the, or any one of the, TREESAME parent(s) and looks only at that path. Let's say it picks I
(along the top row) rather than G
(along the bottom).
What this means in our case is that if someone dropped the ball during a merge, losing a change that was supposed to come in to J
from F
, git log
never shows it. The log
command looks at K
, then J
, then looks at but drops G
, and looks only at I
, then H
, then E
, and then any earlier commits. It never looks at commit F
at all! So we don't see the change to path/to/file1
from F
.
The logic here is simple; I'll quote the git log
documentation directly but add some emphasis:
[Default mode] Simplifies the history to the simplest history explaining the final state of the tree. Simplest because it prunes some side branches if the end result is the same ...
Since the changes in F
were dropped, Git declares them to be irrelevant. You don't need to see them! We'll just ignore that side of the merge entirely!
You can defeat this completely with --full-history
. That tells Git not to prune either side of a merge: it should look down both histories. This will find commit F
for you. Adding -m -p
should also find where the changes were dropped, since it will find all commits that touch the file:
git log -m -p --full-history -- path/to/file1
If the changes were there (in commit F
in our example) and are no longer, there are only two ways they were lost:
They were reverted (either with git revert
, or manually) in an ordinary commit. You would see this as a commit that touches path/to/file1
in the history even without -m
; -p
will show you the diff.
Or, they were lost by being dropped during a merge. You would see the merge even without -m
, but not know for sure that whoever did the merge dropped the ball here; but -m -p
will show both parent diffs, including the one that should have (but did not) take the change.