问题
Edit: git does not mess with character encoding. This is still here to share knowlege and avoid others making the same mistake.
The context: My enterprise uses an svn repository. I'm using git-svn as a client to interact with this repository. All text files in the project are (and must be) encoded with windows default encoding (cp-....). I use git-extensions, and sometimes the command line to pilot git.
What I did: During the last 3 days, I was working on a new feature, and I did a number of local commits. Finally i squashed all these commits into a single one using an interactive rebase, then i used git svn dcommit to push everything on the svn repository in a single commit.
What happened then: A collegue told me that all accents were messed up in the files that I modified, and in the new files after my commit. I had already commited text files with accents in the same repository with my installation of git + svn before, and it's the first time I face this issue.
My investigation:I did the following things to investigate: opened the files with notepad++, and tried the most current encodings (including windows default and UTF-8) to view them: none of them could display accents properly, and different accents are always rendered by the same sequence of strange glyphs.
The temporary workaround:I quickly created a revert commit with git extension and "dcommited" it.
The question:My enterprise svn repository is OK, but now i have the two following problems to solve:
- Understand what happened with the characters with accents
- Retrieve my work from the SVN history and commit it in a proper way (if possible without reviewing manually all the characters with accents)
Can anybody provide some clues (i'm rather new to git) ?
回答1:
And now let's reveal the painful truth (painful for my ego, not for git users): I did mess with the accents, not git.
I could have just removed the question which let's wrongly think that git can mess up with accents, but considering the number of upvotes, i think than a lot of people do the same mistake that i did, so I have chosen to answer my own question to establish the truth, and maybe help people in the same case:
- Git does not touch to characters other than line breaks.
- I broke the accents before commiting, and i did not noticed it because i did not pay enough attention. To do so, i edited some of the files with eclipse. Eclipse did not recognize the encoding and the accents were all replace by a weird byte sequence on save. That's all.
Thanks again to Dmitry Pavlenko for giving me indications on how to investigate this problem.
+1 to "git reflog"
Happy accent fixing ;=)
来源:https://stackoverflow.com/questions/10623498/what-can-cause-git-to-mess-with-character-encoding