utf-16 file seeking in python. how?

后端 未结 1 1224
北海茫月
北海茫月 2020-12-18 20:34

For some reason i can not seek my utf16 file. It produces \'UnicodeException: UTF-16 stream does not start with BOM\'. My code:

f = codecs.open(ai_file, \'r\         


        
相关标签:
1条回答
  • 2020-12-18 21:24

    Well, the error message is telling you why: it's not reading a byte order mark. The byte order mark is at the beginning of the file. Without having read the byte order mark, the UTF-16 decoder can't know what order the bytes are in. Apparently it does this lazily, the first time you read, instead of when you open the file -- or else it is assuming that the seek() is starting a new UTF-16 stream.

    If your file doesn't have a BOM, that's definitely the problem and you should specify the byte order when opening the file (see #2 below). Otherwise, I see two potential solutions:

    1. Read the first two bytes of the file to get the BOM before you seek. You seem to say this didn't work, indicating that perhaps it's expecting a fresh UTF-16 stream after the seek, so:

    2. Specify the byte order explicitly by using utf-16-le or utf-16-be as the encoding when you open the file.

    0 讨论(0)
提交回复
热议问题