I am trying to produce a list of the files that were changed in a specific commit. The problem is, that every file has the version number in a comment at the top of the file
I found it easiest to use git difftool
to launch an external diff tool:
git difftool -y -x "diff -I '<regex>'"
I found a solution. I can use this command:
git diff --numstat --minimal <commit> <commit> | sed '/^[1-]\s\+[1-]\s\+.*/d'
To show the files that have more than one line changed between commits, which eliminates files whose only change was the version number in the comments.
Using 'grep' on the 'git diff' output,
git diff -w | grep -c -E "(^[+-]\s*(\/)?\*)|(^[+-]\s*\/\/)"
comment line changes alone can be calculated. (A)
Using 'git diff --stat' output,
git diff -w --stat
all line changes can be calculated. (B)
To get non comment source line changes (NCSL) count, subtract (A) from (B).
Explanation:
In the 'git diff ' output (in which whitespace changes are ignored),
NOTE: There can be minor errors in the comment line count due to following assumptions, and the result should be taken as a ballpark figure.
1.) Source files are based on the C language. Makefile and shell script files have a different convention, '#', to denote the comment lines and if they are part of diffset, their comment lines won't be counted.
2.) The Git convention of line change: If a line is modified, Git sees it as that particular line is deleted and a new line is inserted there and it may look like two lines are changed whereas in reality one line is modified.
In the below example, the new definition of 'FOO' looks like a two-line change.
$ git diff --stat -w abc.h
...
-#define FOO 7
+#define FOO 105
...
1 files changed, 1 insertions(+), 1 deletions(-)
$
3.) Valid comment lines not matching the pattern (or) Valid source code lines matching the pattern can cause errors in the calculation.
In the below example, the "+ blah blah" line which doesn't start with '*' won't be detected as a comment line.
+ /*
+ blah blah
+ *
+ */
In the below example, the "+ *ptr" line will be counted as a comment line as it starts with *, though it is a valid source code line.
+ printf("\n %p",
+ *ptr);
git diff -G <regex>
And specify a regular expression that does not match your version number line.
Here is a solution that is working well for me. I've written up the solution and some additional missing documentation on the git (log|diff) -G<regex>
option.
It is basically using the same solution as in previous answers, but specifically for comments that start with a *
or a #
, and sometimes a space before the *
... But it still needs to allow #ifdef
, #include
, etc. changes.
Look ahead and look behind do not seem to be supported by the -G
option, nor does the ?
in general, and I have had problems with using *
, too. +
seems to be working well, though.
(Note, tested on Git v2.7.0)
git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])'
-w
ignore whitespace-G
only show diff lines that match the following regex(^[^\*# /])
any line that does not start with a star or a hash or a space(^#\w)
any line that starts with #
followed by a letter(^\s+[^\*#/])
any line that starts with some whitespace followed by a comment characterBasically an SVN hook modifies every file in and out right now and modifies multi-line comment blocks on every file. Now I can diff my changes against SVN without the FYI information that SVN drops in the comments.
Technically this will allow for Python and Bash comments like #TODO
to be shown in the diff, and if a division operator started on a new line in C++ it could be ignored:
a = b
/ c;
Also the documentation on -G
in Git seemed pretty lacking, so the information here should help:
git diff -G<regex>
-G<regex>
Look for differences whose patch text contains added/removed lines that match
<regex>
.To illustrate the difference between
-S<regex> --pickaxe-regex
and-G<regex>
, consider a commit with the following diff in the same file:+ return !regexec(regexp, two->ptr, 1, ®match, 0); ... - hit = !regexec(regexp, mf2.ptr, 1, ®match, 0);
While
git log -G"regexec\(regexp"
will show this commit,git log -S"regexec\(regexp" --pickaxe-regex
will not (because the number of occurrences of that string did not change).See the pickaxe entry in gitdiffcore(7) for more information.
(Note, tested on Git v2.7.0)
-G
uses a basic regular expression.?
, *
, !
, {
, }
regular expression syntax.()
and OR-ing groups works with |
.\s
, \W
, etc. are supported.^$
work.Note that the -G
option filters the files that will be diffed.
But if a file gets "diffed" those lines that were "excluded/included" before will all be shown in the diff.
Only show file differences with at least one line that mentions foo
.
git diff -G'foo'
Show file differences for everything except lines that start with a #
git diff -G'^[^#]'
Show files that have differences mentioning FIXME
or TODO
git diff -G`(FIXME)|(TODO)`
See also git log -G
, git grep
, git log -S
, --pickaxe-regex
, and --pickaxe-all
https://github.com/git/git/search?utf8=%E2%9C%93&q=regcomp&type=
https://github.com/git/git/blob/master/diffcore-pickaxe.c
if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
int cflags = REG_EXTENDED | REG_NEWLINE;
if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
cflags |= REG_ICASE;
regcomp_or_die(®ex, needle, cflags);
regexp = ®ex;
// and in the regcom_or_die function
regcomp(regex, needle, cflags);
http://man7.org/linux/man-pages/man3/regexec.3.html
REG_EXTENDED
Use POSIX Extended Regular Expression syntax when interpreting
regex. If not set, POSIX Basic Regular Expression syntax is
used.
// ...
REG_NEWLINE
Match-any-character operators don't match a newline.
A nonmatching list ([^...]) not containing a newline does not
match a newline.
Match-beginning-of-line operator (^) matches the empty string
immediately after a newline, regardless of whether eflags, the
execution flags of regexec(), contains REG_NOTBOL.
Match-end-of-line operator ($) matches the empty string
immediately before a newline, regardless of whether eflags
contains REG_NOTEOL.
Perhaps a Bash script like this:
#!/bin/bash
git diff --name-only "$@" | while read FPATH ; do
LINES_COUNT=`git diff --textconv "$FPATH" "$@" | sed '/^[1-]\s\+[1-]\s\+.*/d' | wc -l`
if [ $LINES_COUNT -gt 0 ] ; then
echo -e "$LINES_COUNT\t$FPATH"
fi
done | sort -n