可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
How can I add BOM (unicode signature) while saving file in python:
file_old = open('old.txt', mode='r', encoding='utf-8') file_new = open('new.txt', mode='w', encoding='utf-16-le') file_new.write(file_old.read())
I need to convert file to utf-16-le + BOM
. Now script is working great, except that there is no BOM.
回答1:
Write it directly at the beginning of the file:
file_new.write('\ufeff')
回答2:
It's better to use constants from 'codecs' module.
import codecs f.write(codecs.BOM_UTF16_LE)
回答3:
Why do you think you need to specifically make it UTF16LE? Just use 'utf16' as the encoding, Python will write it in your endianness with the appropriate BOM, and all the consumer needs to be told is that the file is UTF-16 ... that's the whole point of having a BOM.
If the consumer is insisting that the file must be encoded in UTF16LE, then you don't need a BOM.
If the file is written the way that you specify, and the consumer opens it with UTF16LE encoding, they will get a \ufeff
at the start of the file, which is a nuisance, and needs to be ignored.
回答4:
I had a similar situation where a 3rd party app did not accept the file I generated unless it had a BOM.
For some reason in Python 2.7 the following does not work for me
write('\ufeff')
I had to substitute it with
write('\xff\xfe')
and that is the same as
write(codecs.BOM_UTF16_LE)
my final output file was written with the following code
import codecs mytext = "Help me" with open("c:\\temp\\myFile.txt", 'w') as f: f.write(codecs.BOM_UTF16_LE) f.write(mytext.encode('utf-16-le'))
This answer may be useless for the original asker but it may help someone like me who stumbles upon this issue
回答5:
For UTF-8 with BOM you can use:
def addUTF8Bom(filename): f = codecs.open(filename, 'r', 'utf-8') content = f.read() f.close() f2 = codecs.open(filename, 'w', 'utf-8') f2.write(u'\ufeff') f2.write(content) f2.close()
回答6:
vitperov's answer for python3:
def add_utf8_bom(filename): with codecs.open(filename, 'r', 'utf-8') as f: content = f.read() with codecs.open(filename, 'w', 'utf-8') as f2: f2.write('\ufeff') f2.write(content) return
回答7:
Just choose the encoding with BOM:
with codecs.open('outputfile.csv', 'w', 'utf-8-sig') as f: f.write('a,é')
(In python 3 you can drop the codecs.
)