UTF8 Python BOM [duplicate]

时间秒杀一切 提交于 2019-12-01 00:47:47
fileObj = codecs.open(filename,"w",'utf-8')

OK, you have a Unicode output stream.

fileObj.write(codecs.BOM_UTF8)

BOM_UTF8 is a sequence of bytes, not a Unicode string as you would expect to write to a Unicode stream. Python will automatically convert from bytes to Unicode using some encoding which may not be the correct one. If the default encoding is Windows code page 1252 rather than UTF-8, you'll be effectively double-encoding the BOM and it will come as the UTF-8 encoding of .

Suggest writing the BOM as the Unicode character it is instead:

fileObj.write(u'\uFEFF')

InternetSeriousBusiness wrote:

Isn't the UTF-8 BOM discouraged, anyway? –

Yes, the UTF-8 faux-BOM is largely a disaster in most contexts, but it is needed to get Excel's charset guessing to pick up UTF-8. Unfortunately it doesn't work in Excel for Mac. Another possible approach might be to use UTF-16.

The string you copied is the UTF-8 BOM. So your problem is not in your python code but somewhere else.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!