问题
I recently added a .gitattributes file to a c# repository with the following settings:
* text=auto
*.cs text diff=csharp
I renormalized the repository following these instructions from github and it seemed to work OK.
The problem I have is when I checkout some files (not all of them) I see lots of weird characters mixed in with the actual code. It seems to happen when git runs the files through the lf->crlf
conversion specified by the .gitattributes file above.
According to Notepad++ the files that get messed up are using UCS-2 Little Endian
or UCS-2 Big Endian
encoding. The files that seem to work OK are either ANSI
or UTF-8
encoded.
For reference my git version is 1.8.0.msysgit.0
and my OS is Windows 8.
Any ideas how I can fix this? Would changing the encoding of the files be enough?
回答1:
This happens if you use an encoding where every character is two bytes.
CRLF would then be encoded as \0\r\0\n
.
Git thinks it's a single-byte encoding, so it turns that into \0\r\0\r\n
.
This makes the next line one byte off, causing every other line be full of Chinese. (because the \0
becomes the low-order byte rather than the high-order byte)
You can convert files to UTF8 using this LINQPad script:
const string path = @"C:\...";
foreach (var file in Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories))
{
if (!new [] { ".html", ".js"}.Contains(Path.GetExtension(file)))
continue;
File.WriteAllText(file, String.Join("\r\n", File.ReadAllLines(file)), new UTF8Encoding(encoderShouldEmitUTF8Identifier: true));
file.Dump();
}
This will not fix broken files; you can fix the files by replacing \r\n
with \n
in a hex editor. I don't have a LINQPad script for that. (since there's no simple Replace()
method for byte[]
s)
回答2:
To fix this, either convert the encoding of the files (UTF-8 should be ok) or disable the line break auto conversion (git config core.autocrlf false
and .gitattributes stuff you have).
来源:https://stackoverflow.com/questions/13704936/how-to-stop-git-from-breaking-encoding-on-checkout