I\'m writing a scripts that takes a path as a parameter and outputs Git commits for that path, similarly to what GitHub does when you click the History button in so
You are indeed being bitten by History Simplification. Note that simplification is enabled by default when using any path names with git log
. It is not enabled by default if you do not supply path names. Adding particular options, like --full-history
or --simplify-*
(You may also get bitten by the implied --follow
from having log.follow
set to true
, but it's harder to see where that would occur for this particular case.)
The simplification works by doing very limited git diff
s. Remember that as git log
is walking through the commit graph, it is working on one commit C at a time. Each commit C has some set of parent commits. For an ordinary (non-merge) commit, there is just one parent, so for each file in C that is to be examined—based on the path names you gave—either that file in C is 100% identical to that file in its parent P, or it's different, and that's easy for Git to tell because a path that is 100% identical in both commits has the same blob hash in the commit's attached tree.
That's what the TREESAME expression in the documentation means: we take commit C's tree, remove all the paths that aren't being examined, leaving (in memory—none of this affects anything stored in the repository!) a skeleton tree attached to C that has the files that are being examined. Then we take the (single) parent P and do the same thing. The result is either matching—C and its parent P are TREESAME—or non-matching.
The commit is "interesting" and will be displayed if it's interesting. Even if it's not interesting, Git will still put the parent P into the graph-walk priority queue to examine later, because this is just an ordinary commit and Git must walk through it to construct a history. (There's some weirdness here with "parent rewriting" that I'm going to skip over, though it matters for --graph
.)
At merges, however, things are different. Commit C still has its one tree as usual, but it has multiple parent commits Pi. Git will do the same "strip down the tree" operation for each parent. When you're not using --full-history
, Git will then compare the stripped-down trees of C vs each Pi. The merge itself is included if it's not TREESAME to any parent, but if it is TREESAME to at least one parent Pi, the merge tends to get excluded (depending on other options) and Git puts only that parent into the priority queue for walking through the graph. If C is TREESAME to multiple Pi Pj Pk ..., Git picks one of these parents randomly and discards the rest, by default.
Adding --full-history
disables the discarding of all but one Pi. So now Git will walk all the parents of the merge. This doesn't affect whether the merge itself is displayed, it just makes sure that Git walks both "sides" of the merge, or all arms if it's a multi-way octopus merge.
The logic here is that if the file(s) you're looking at are the same in commit C and commit Pi, why then, you don't care that they're different in some other parent Po, because the file has its current form due to parent Pi rather than parent Po. This logic is correct if you think that the file(s) you are looking at are right, but falls apart if you think they are wrong and you are looking for the merge that lost the changes you wanted.
--follow
(Since your path name is .
, and Git generally does not do directories at all—using a directory name really means all files anywhere under the directory, recursively—this shouldn't matter here. If you use a file name, though, it might matter. Remember that --follow
is only obeyed if you're looking at exactly one file.)
The way that --follow
works, which is the reason it only works for one path name (and shouldn't be a problem with .
as the path), is that when Git is doing this choose whether a commit that we walk, as we walk through the commit graph, is interesting and should therefore be displayed testing, it's doing these git diff
s on each commit vs its parent(s).
Unlike the TREESAME diff, the --follow
test is a full diff—it's more expensive than the quick 100%-the-same, at least for the more interesting problem cases—but it's limited to one file, which keeps it from being too costly. It also applies only to single-parent commits, though this comes after --first-parent
(if you used that) strips away the other parents or after -m
(if you used that) splits a merge into multiple virtual commits that share the same tree, or after history simplification has picked just one parent to follow.1 In any case, if the parent does not have a file with the (single) path name that you're logging, Git does a full diff of the parent and the child to see if it can find some renamed file in the parent. If it can find such a renamed file, first it shows the child—because the file changed: it was at the very least renamed after all—and then Git changes the path name it is looking for as it traverses to the child's parent.
That is, Git started out looking for dir/sub/file.ext
, hit a commit C where the parent of C didn't have a dir/sub/file.ext
, did a full-blown diff, and found a sufficiently similar file named path/to/old.name
. So Git shows you commit C, saying R<percent> path/to/old.name -> dir/sub/file.ext
, and then moves on to P—but now instead of looking for changes to the path dir/sub/file.ext
, it's looking for changes to the path path/to/old.name
.
This particular trick can't work well across all merges: the file could be renamed in just one of the various arms of the merge, or it could be renamed in multiple arms, depending on who did the renaming and when. Git can only look for one path name—it doesn't keep looking for both names. Of course, supplying a path name turns on history simplification, so in general there aren't any merges to worry about after all. The merge case happens only if you use a flag like --full-history
or --simplify-merges
.
1Note that if History Simplification has picked one parent from a merge, it has picked a P that is TREESAME to C after stripping out all files except the one we care about—so by definition, the one file we're --follow
ing in C matches the same-named file in parent P. This means commit C will turn out to be uninteresting after all.