My problem:
I am a code reviewer, I have a situation in GIT:
a.txt
Then a developer decided to split the content of
Is there an easy way to see:
- what came to b from a?
- what came to c from a?
- all extra changes apart from just moving stuff?
I don't think there's really any way to extract this information other than visually inspecting the diff. However, it looks like we may be able to detect a split files using git diff
along with the -C
argument. For example, I start with a file that contains 38 lines, and move 24 into one file and 14 into another (and delete the original). git diff --name-status
just tells me that I have renamed one file and added another:
R060 lorem.txt fileA
A fileB
But if we modify our command line to detect copies:
git diff --name-status -C30 HEAD^
We get:
C060 lorem.txt fileA
R039 lorem.txt fileB
The -C30
argument says "consider a file a copy if it is at least 30% similar to another file included in the commit". Note that there is a corresponding -M
option that controls rename detection; it defaults to 50%
.
A certain policy/workflow that prevents from problem like this would also help.
What exactly are you trying to prevent? There's not really anyway to distinguish "I split a file into two new files" from "I deleted a file and created two new files".
You could in theory prevent commits that both introduce new files and modify existing files. That would be relatively easy with a pre-receive
hook, for example. But that's such a common situation, I'm not sure you'd want to do this in practice.
For the above, a pre-receive
hook like the following might work:
#!/bin/bash
while read old new ref; do
while read type name; do
if [ "$type" = "A" ]; then
has_new=1
else
has_mod=1
fi
done < <(git show --name-status --format='' $new)
done
if [ "$has_new" = 1 -a "$has_mod" = 1 ]; then
echo "ERROR: commits may not both create and modify files" >&2
exit 1
fi
exit 0
We could alternatively use our "split detection", discussed earlier, and implement something like:
#!/bin/bash
while read old new ref; do
git diff --name-status -C30 $old $new |
awk '
{total[$2]++}
END {for (i in total) if (total[i] > 1) exit 1}
'
if [ $? -ne 0 ]; then
echo "ERROR: detected a split file"
exit 1
fi
done
exit 0
This will exit with an error if any file shows up as the "old name" for a file more than once. Trying to push to a repository using this pre-receive
hook, using the example given in the first part of this answers, get me:
$ git push
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), 1.46 KiB | 1.46 MiB/s, done.
Total 5 (delta 0), reused 0 (delta 0)
remote: ERROR: detected a split file
To upstream
! [remote rejected] master -> master (pre-receive hook declined)
Maybe that helps? Without extensive testing I would worry about false positives with this solution.