iconv: Convert from CP1252 to UTF-8

冷暖自知 提交于 2019-12-10 12:36:36

问题


I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command:

iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt

No luck, getting some weird results:

ÊÀÇÀÃÃœ ÃÎÂÛÉ ÂÅÊ

I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/

What is going wrong?


回答1:


When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8 with command iconv.exe -f CP1252 -t UTF-8 test.txt >testout.txt then the source file test.txt (Hex view:

) will be converted into target file testout.txt (Hex view:

) which is UTF-8 code for Çàïèñêè ýêñïåäèòîðà.

Same garbage you put in will come the other end out. iconv's behavior is correct and as expected.

What you are perplexed by is that you don't see what you expect and that is because your input 8bit string is actually encoded in Windows-1251 (Cyrillic) Codepage.

→ So the correct code page is not CP1252 but CP1251

Command iconv.exe -f CP1251 -t UTF-8 test.txt >testout2.txt converts the source file test.txt into target file testout2.txt (Hex view:

) which is UTF-8 code for Записки экспедитора which is what your user's expect to see




回答2:


You ned to use this one:

$ echo "Çàïèñêè ýêñïåäèòîðà" | iconv -t latin1 | iconv -f cp1251
Записки экспедитора



回答3:


My solution:

iconv -f windows-1252 -t utf-8 in.file -o out.file



回答4:


if you're using linux you should use enconv

./enconv.sh -d /home/foo/example/directory -e ".java" -f "iso-8859-1" -t "utf-8"



回答5:


iconv -f utf8 -t cp1252 file.php | iconv -f cp1251 -t utf8 > file-utf8.php



回答6:


try the opposite

  iconv -c -f=UTF-8 -t=WINDOWS-1252 test.txt


来源:https://stackoverflow.com/questions/15422753/iconv-convert-from-cp1252-to-utf-8

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!