Is there any way to make git gui
display and show diffs for UTF16 files somehow?
I found some information, but this is mostly referring to the command l
I have been working on a much better solution with help from the msysGit people, and have come up with this clean/smudge filter. The filter uses the Gnu file and iconv commands to determine the type of the file, and convert it to and from msysGit's internal UTF-8 format.
This type of Clean/Smudge Filter gives you much more flexibility. It should allow Git to treat your mixed-format files as UTF-8 text in most cases: diffs, merge, git-grep, as well as gitattributes properties like eol-conversion, ident-replacement, and built-in diff patterns.
The diff filter solution outlined above only works for diffs, and so is much more limited.
To set up this filter:
Add the following to ~\Git\etc\gitconfig:
[filter "mixedtext"]
clean = iconv -sc -f $(file -b --mime-encoding %f) -t utf-8
smudge = iconv -sc -f utf-8 -t $(file -b --mime-encoding %f)
required
Add a line to your global ~/Git/etc/gitattributes or local ~/.gitattributes to handle mixed format text, for example:
*.txt filter=mixedtext
I have used this on a directory with sql files in ANSI, UTF-16, and UTF-8 formats. It is working so far. Barring any surprises, this looks like the 20% effort that could cover 80% of all Windows text format problems.
This method is for MSysGit 1.8.1, and is tested on Windows XP. I use Git Extensions 2.44, but since the changes are at the Git level, they should work for Git Gui as well. Steps:
Install Gnu Iconv.
Create the following script, name it astextutf16
, and place it in the /bin directory of your Git installation (this is based on the existing astextplain
script):
#!/bin/sh -e
# converts Windows Unicode (UTF-16 / UCS-2) to Git-friendly UTF-8
# notes:
# * requires Gnu iconv:
# http://gnuwin32.sourceforge.net/packages/libiconv.htm
# * this script must be placed in: ~/Git/bin
# * modify global ~/Git/etc/gitconfig or local ~/.git/config:
# [diff "astextutf16"]
# textconv = astextutf16
# * or, from command line:
# $ git config diff.astextutf16.textconv astextutf16
# * modify global ~/Git/etc/gitattributes or local ~/.gitattributes:
# *.txt diff=astextutf16
if test "$#" != 1 ; then
echo "Usage: astextutf16 <file>" 1>&2
exit 1
fi
# -f(rom) utf-16 -t(o) utf-8
"\Program Files\GnuWin32\bin\iconv.exe" -f utf-16 -t utf-8 "$1"
exit 0
Modify the global ~/Git/etc/gitconfig or your local ~/.git/config file, and add these lines:
[diff "astextutf16"]
textconv = astextutf16
Or, from command line:
$ git config diff.astextutf16.textconv astextutf16
Modify the global ~/Git/etc/gitattributes or your local ~/.gitattributes file, and map your extensions to be converted:
*.txt diff=astextutf16
Test. UTF-16 files should now be visible.
I ran into a similar issue.
I would like to improved on the accepted answer, since it has a small flaw. The problem I ran into was that if the file did not exist, I received this error:
conversion to cannot unsupported
I changed the commands so that a file is not required. It uses only stdin/stdout. This fixed the issue. My .git/config file now looks like this:
[filter "mixedtext"]
clean = "GITTMP=$(mktemp);TYPE=$( tee $GITTMP|file -b --mime-encoding - ); cat $GITTMP | iconv -sc -f $TYPE -t utf-8; rm -f $GITTMP"
smudge = "GITTMP=$(mktemp);TYPE=$( tee $GITTMP|file -b --mime-encoding - ); cat $GITTMP | iconv -sc -f utf-8 -t $TYPE; rm -f $GITTMP"
required = true
To create the entries in your .git/config file use these commands:
git config --replace-all filter.mixedtext.clean 'GITTMP=$(mktemp);TYPE=$( tee $GITTMP|file -b --mime-encoding - ); cat $GITTMP | iconv -sc -f $TYPE -t utf-8; rm -f $GITTMP'
git config --replace-all filter.mixedtext.smudge 'GITTMP=$(mktemp);TYPE=$( tee $GITTMP|file -b --mime-encoding - ); cat $GITTMP | iconv -sc -f utf-8 -t $TYPE; rm -f $GITTMP'
git config --replace-all filter.mixedtext.required true
My .gitattributes file looks like this:
*.txt filter=mixedtext
*.ps1 filter=mixedtext
*.sql filter=mixedtext
Specify only the files that might be an issue otherwise the clean/smudge has to do more work (temp files).
We also bulk converted the UTF-16le files in git to UTF-8 since this is the most compact and portable encoding for UTF. The same iconv command used in clean and smudge was perfect for permanently converting the files.
The nice thing about the clean/smudge commands is that even if a file is checked in with, say, UTF-16le, the diff will still work.