Need to get all file differences (added, modified, renamed) between two Git commits

喜你入骨 提交于 2019-12-08 03:03:23
torek

As in CodeWizard's answer, you can use the "user-friendly" (or porcelain) command git diff instead of git diff-tree, which is what Git calls a plumbing command, meant for use in scripts. You should, however, be aware of what this means.

Since porcelain commands are meant for humans, they try to present things in human-readable fashion. This means they obey any setting that the one human in particular has set for himself/herself, in the various configuration files. That includes the diff.renames and diff.renameLimit configurations. They may also modify their output to make it easier for eyeballs, yet harder for computer programs, to deal with. Worst, they may change their output from one Git version to another, if people seem to prefer some default.

Since scripts are not meant for the above, they behave in predictable ways, with output that does not change, nor depend on configuration items. That way, whatever you request, you get: you will get reliable output in a reliable form, so that if you write your own reliable code, it will not just work today, for one case; it will keep working in the future, for all cases where it can.1

In the end, what this means is that if you use git diff-tree and set the right flags, you will get more reliable output. If you use git diff, your rename detection depends on:

As you discovered, the output from rename-detection is two pathnames, which is not something you can just pipe to an archiver. Archivers in general have issues with file deletion (this is, perhaps, one classic difference between archives and backups / snapshots; note that both of these are related to version control as well).

If your goal is a sort of union of all files—i.e., if the diff says that a file named A was added, one named D was deleted, and file R was created by renaming the old name O (and perhaps also modifying it: note Git's similarity index number that comes after the letter R), then you wish to collect file A, ignore file D, and collect file R while ignoring file O—well, then, what you want is to not detect renames in the first place! If you do not detect renames—which git diff-tree does not by default—this same diff will be presented as: add file A, delete file D, delete file O, and add file R. So a git diff-tree with a diff-filter that includes AM and excludes D suffices. It is less clear what to do with T, which is for a type-change: from ordinary file to symbolic link, for instance, or from file to sub-repository commit hash (what Git calls a gitlink entry, for a submodule).

Similarly, you don't want to enable copy detection: a C status, like R, presents a similarity index and a pair of pathnames. If you leave it disabled, you simply get the new pathname as an Added file.

Even if you do all this, you are still set up for a pitfall. Suppose that commit hash C1 has a file named problem, and a (presumably later) commit hash C2 has instead two files named problem/A and problem/B. This implies that the original file problem was deleted between these two points, because most systems (including Git itself) forbid having both a file named problem and a directory named problem holding various files. Given that each tar-archive itself is not a complete snapshot—you omit files that are unmodified between C1 and C2—your procedure for extracting these snapshots must necessarily be additive: extract earlier snapshot, then extract later snapshot atop earlier snapshot. This process will fail at the point where file problem is in the way of creating directory problem. Obviously, you can check for such problems and remove the problematic file (you can see now why I named the file problem :-) ), but more generally, since you are not storing "delete" directives in the first place, you won't know, in a future case where you are using these archives to rebuild a snapshot, that some files don't belong in that snapshot at all.

(The classic solution to this problem is to prefix update-archives with some kind of manifest or directive. If you decide to use such a solution, then, depending on the kind of detail you want in the manifest-or-directive, you might want to do a first pass to detect exact renames and/or exact copies.)


1Obviously, newly added features can present problems for everyone, not just scripts and not just humans, but the Git folks do work hard on not creating unnecessary problems for scripts that rely on plumbing commands. Consider, for instance, the new impetus to push Git toward using some flavor of SHA-256 instead of, or in addition to, SHA-1. Since SHA-1 produces 160-bit hashes, and SHA-256 produces 256 bit hashes, these must be represented as 40 and 64 hexadecimal digits respectively. Linus suggested abbreviating 256-bit hashes to 40 characters by default, to help out existing scripts that assume 40 characters, but I foresee some problems... :-)

Why not using this simple status command:

git diff --name-only SHA1 SHA2
# or

# --name-status will display the name and the status of the files
git diff --name-status SHA1 SHA2

# To display untracked files use the -u
git status -u

And in git you should rename files only with the git mv command.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!