问题
I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command:
iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt
No luck, getting some weird results:
ÊÀÇÀÃÃœ ÃÎÂÛÉ ÂÅÊ
I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/
What is going wrong?
回答1:
When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà
to UTF-8 with command iconv.exe -f CP1252 -t UTF-8 test.txt >testout.txt
then the source file test.txt
(Hex view:
) will be converted into target file testout.txt
(Hex view:
) which is UTF-8 code for Çàïèñêè ýêñïåäèòîðà
.
Same garbage you put in will come the other end out. iconv's behavior is correct and as expected.
What you are perplexed by is that you don't see what you expect and that is because your input 8bit string is actually encoded in Windows-1251 (Cyrillic) Codepage.
→ So the correct code page is not CP1252 but CP1251 ←
Command iconv.exe -f CP1251 -t UTF-8 test.txt >testout2.txt
converts the source file test.txt
into target file testout2.txt
(Hex view:
) which is UTF-8 code for Записки экспедитора
which is what your user's expect to see
回答2:
You ned to use this one:
$ echo "Çàïèñêè ýêñïåäèòîðà" | iconv -t latin1 | iconv -f cp1251
Записки экспедитора
回答3:
My solution:
iconv -f windows-1252 -t utf-8 in.file -o out.file
回答4:
if you're using linux you should use enconv
./enconv.sh -d /home/foo/example/directory -e ".java" -f "iso-8859-1" -t "utf-8"
回答5:
iconv -f utf8 -t cp1252 file.php | iconv -f cp1251 -t utf8 > file-utf8.php
回答6:
try the opposite
iconv -c -f=UTF-8 -t=WINDOWS-1252 test.txt
来源:https://stackoverflow.com/questions/15422753/iconv-convert-from-cp1252-to-utf-8