UTF8 Python BOM [duplicate]

扶醉桌前 提交于 2019-11-30 19:55:54

问题


Possible Duplicate:
Write to utf-8 file in python

I have Unicode strings (with Japanese characters) I want to write to a CSV file. However, the BOM does not seem to be written correctly, just as a string "" in the first line. This leads to Excel not displaying the Japanese characters correctly. When opening the CSV with Notepad++, the characters are displayed correctly.

fileObj = codecs.open(filename,"w",'utf-8')
fileObj.write(codecs.BOM_UTF8)
c = u';'
for s in stringsToWrite:
   line = e.someUnicodeString
   fileObj.write(line)
fileObj.close()

回答1:


fileObj = codecs.open(filename,"w",'utf-8')

OK, you have a Unicode output stream.

fileObj.write(codecs.BOM_UTF8)

BOM_UTF8 is a sequence of bytes, not a Unicode string as you would expect to write to a Unicode stream. Python will automatically convert from bytes to Unicode using some encoding which may not be the correct one. If the default encoding is Windows code page 1252 rather than UTF-8, you'll be effectively double-encoding the BOM and it will come as the UTF-8 encoding of .

Suggest writing the BOM as the Unicode character it is instead:

fileObj.write(u'\uFEFF')

InternetSeriousBusiness wrote:

Isn't the UTF-8 BOM discouraged, anyway? –

Yes, the UTF-8 faux-BOM is largely a disaster in most contexts, but it is needed to get Excel's charset guessing to pick up UTF-8. Unfortunately it doesn't work in Excel for Mac. Another possible approach might be to use UTF-16.




回答2:


The string you copied is the UTF-8 BOM. So your problem is not in your python code but somewhere else.



来源:https://stackoverflow.com/questions/12180376/utf8-python-bom

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!