It seems that it would be extremely handy to be able to filter a diff so that trivial changes are not displayed. I would like to write a regular expression which would be ru
$ git diff --help
-G<regex>
Look for differences whose added or removed line matches the given <regex>.
EDIT:
After some tests I've got something like
git diff -b -w --word-diff-regex='.*\[[^"]*\]'
Then I've got output like:
diff --git a/test.php b/test.php
index 62a2de0..b76891f 100644
--- a/test.php
+++ b/test.php
@@ -1,3 +1,5 @@
<?php
{+$my_array[my_key]+} = "test";
?>
diff --git a/test1.php b/test1.php
index 62a2de0..6102fed 100644
--- a/test1.php
+++ b/test1.php
@@ -1,3 +1,5 @@
<?php
some_other_stuff();
?>
Maybe it will help you. I found it here http://www.rhinocerus.net/forum/lang-lisp/659593-git-word-diff-regex-lisp-source.html and there is more information on this thread
EDIT2:
git diff -G'\[[A-Za-z_]*\]' --pickaxe-regex
Normalize the input files in a first step, then compare the normalized files. This gives you most control over the process. E.g. you might want to only apply the regexp to non-HTML parts of the code, not inside of strings, not inside of comments (or ignore comments altogether). Computing a diff on the normalized code is the proper way to do such things; working with regexps on single lines is much more error-prone and at most a hack.
Some diff utilities such as e.g. meld
allow hiding "insignificant" difference, and come with a set of default patterns to e.g. hide whitespace-only changes. This is pretty much what you want, I guess.
from my own git --help
--word-diff-regex=
<regex>
Use
<regex>
to decide what a word is, instead of considering runs of non-whitespace to be a word. Also implies --word-diff unless it was already enabled. Every non-overlapping match of the<regex>
is considered a word. Anything between these matches is considered whitespace and ignored(!) for the purposes of finding differences. You may want to append|[^[:space:]]
to your regular expression to make sure that it matches all non-whitespace characters. A match that contains a newline is silently truncated(!) at the newline. The regex can also be set via a diff driver or configuration option, see gitattributes(1) or git-config(1). Giving it explicitly overrides any diff driver or configuration setting. Diff drivers override configuration settings.
grepdiff can be used to filter the hunks in the diff file.
$ git diff -U1 | grepdiff 'console' --output-matching=hunk
It shows only the hunks that match with the given string "console".
There does not seem to be any options to Git's diff command to support what you want to do. However, you could use the GIT_EXTERNAL_DIFF environment variable and a custom script (or any executable created using your preferred scripting or programming language) to manipulate a patch.
I'll assume you are on Linux; if not, you could tweak this concept to suit your environment. Let's say you have a Git repo where HEAD
has a file file05
that contains:
line 26662: $my_array[my_key]
And a file file06
that contains:
line 19768: $my_array[my_key]
line 19769: $my_array[my_key]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key]
line 19776: $my_array[my_key]
You change file05
to:
line 26662: $my_array["my_key"]
And you change file06
to:
line 19768: $my_array[my_key]
line 19769: $my_array["my_key"]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key2]
line 19776: $my_array[my_key]
Using the following shell script, let's call it mydiff.sh
and place it somewhere that's in our PATH
:
#!/bin/bash
echo "$@"
git diff-files --patch --word-diff=porcelain "${5}" | awk '
/^-./ {rec = FNR; prev = substr($0, 2);}
FNR == rec + 1 && /^+./ {
ln = substr($0, 2);
gsub("\\[\"", "[", ln);
gsub("\"\\]", "]", ln);
if (prev == ln) {
print " " ln;
} else {
print "-" prev;
print "+" ln;
}
}
FNR != rec && FNR != rec + 1 {print;}
'
Executing the command:
GIT_EXTERNAL_DIFF=mydiff.sh git --no-pager diff
Will output:
file05 /tmp/r2aBca_file05 d86525edcf5ec0157366ea6c41bc6e4965b3be1e 100644 file05 0000000000000000000000000000000000000000 100644
index d86525e..c2180dc 100644
--- a/file05
+++ b/file05
@@ -1 +1 @@
line 26662:
$my_array[my_key]
~
file06 /tmp/2lgz7J_file06 d84a44f9a9aac6fb82e6ffb94db0eec5c575787d 100644 file06 0000000000000000000000000000000000000000 100644
index d84a44f..bc27446 100644
--- a/file06
+++ b/file06
@@ -1,8 +1,8 @@
line 19768: $my_array[my_key]
~
line 19769:
$my_array[my_key]
~
line 19770: $my_array[my_key]
~
line 19771: $my_array[my_key]
~
line 19772: $my_array[my_key]
~
line 19773: $my_array[my_key]
~
line 19775:
-$my_array[my_key]
+$my_array[my_key2]
~
line 19776: $my_array[my_key]
~
This output does not show changes for the added quotes in file05
and file06
. The external diff script basically uses the Git diff-files command to create the patch and filters the output through a GNU awk script to manipulate it. This sample script does not handle all the different combinations of old and new files mentioned for GIT_EXTERNAL_DIFF nor does it output a valid patch, but it should be enough to get you started.
You could use Perl regular expressions, Python difflib or whatever you're comfortable with to implement an external diff tool that suits your needs.
If the goal is minimize trivial differences, you might consider our SmartDifferencer tool.
These tools compare the language syntax, not the layout, so many trivial changes (layout, modified comments, even changed radix on numbers) are ignored and not reported. Each tool has a full language parser; there's a version for many languages, including PHP.
It won't handle the example $FOO[abc] as being "semantically identical" to $FOO["abc"], because they are not. If abc actaully has a definition as as constant, then $FOO["abc"] is not semantically equivalent.