问题
I'm using Git for Windows (and TortoiseGit).
My goal is to prevent commits which have at least one non-UTF-8 file among modified/added.
Enumerating modified/added files: I've found the following code
{ git diff --name-only ; git diff --name-only --staged ; }
Is this the best (correct and most concise) approach?
Searching for non-UTF-8 files: I've found the following code
{ git diff --name-only ; git diff --name-only --staged ; } | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {} - is non-UTF8!"
If I start Git Bash at my repository root folder - it works (each non-UTF-8 file is displayed). So I've renamed
.git/hooks/pre-commit.sample
to.git/hooks/pre-commit
and copy-pasted the code above. After committing changes nothing special displays inside TortoiseGit commit gui window. So looks like pre-commit hook is not working correctly.Rejecting commit if there is any non-UTF-8 file: After displaying all non-UTP-8 files commit should be rejected. But I have no idea how to do this (show some exit code - but how?).
So any help is appreciated.
回答1:
So the answer is (thx to phd and great thx to torek for his useful notes):
git diff --name-only --staged --diff-filter d | xargs -I {} bash -c
"iconv -f utf-8 -t utf-16 {} &>/dev/null || { echo {} - is non-UTF8!; exit 1; }"
This code iterates through all files, that changed in commit (except for deleted - i.e. added, modified, copied and renamed) and checks if there is any non-UTF8 file. All found files are listed and commit is aborted.
回答2:
Your existing solution is probably sufficient. It's not 100% correct though: here are the remaining issues, all of which are minor ones that you can fix later (if ever) at your leisure:
You need only the
git diff ... --staged
(or--cached
), as what Git will commit is whatever files are in the index/staging-area, andgit diff
compares that with what's in theHEAD
commit and tells you what's different there. If a copy of a file in the index differs from the copy of the file inHEAD
, you should examine the index copy.Technically it would be better to use
git diff-index --cached
here so as to not obey any of the user'sgit diff
configuration. That is,git diff-index
is a plumbing command in Git, which means it's aimed at being used from other computer programs: it runs in a completely predictable manner based on arguments only, not on anygit config
settings. But if you're doing this for yourself, and you configuregit diff
such that it breaks your own use ofgit diff
, well, that's your own fault. :-)You might also consider using a
--diff-filter
to exclude deleted files here. Otherwise your checker will always fail on deletion (asiconv
won't be able to read the deleted file).Most significant:
iconv
will be reading the file from the work-tree. As I noted in the first bullet point, Git is going to commit what's staged, not what's in the work-tree.
As an example—which may or may not be possible from within TortoiseGit—consider what happens if you do this:
$ git checkout master
$ printf '\300\300\300' > badfile # put bad non-UTF-8 crud into file
$ git add badfile # copy file into index
$ echo 'good data' > badfile # replace work-tree contents
$ git commit
This commit is going to commit the bad contents—the three bytes of \300
with no newline—that are in the index, but your pre-commit hook is going to run iconv -f utf-8 -t utf-16
over the contents of the good file, reading good data
, that is of course good.
To fix this, your pre-commit filter must extract the data from the index for each file that is to be committed. How you go about doing that is up to you. The simplest (but perhaps slowest) method is to just extract the entire index contents to a temporary work area using git checkout-index
. A better method might be to turn each in-index (in-staging-area) path name to valid index specifier (that is, path/to/file
becomes :path/to/file
) and use git cat-file -p $specifier | iconv ...
to scan each. But all of these will be fairly inefficient, especially on Windows. For efficiency, you might want to write a Python script that uses git cat-file --batch
to extract them all in one pass, and do the format-checking there.
来源:https://stackoverflow.com/questions/55645733/git-pre-commit-hook-which-searches-non-utf-8-encodings-among-modified-added-file