Python: Unicode source file adds spaces (actually null bytes) between characters

匆匆过客 提交于 2019-12-01 02:13:20

问题


I am a newbie. However, I managed to extract some lines from a txt-file (unicode) and write them in another file.

lines = InFile.readlines()
OutFile.writelines(lines[3:])

It is working but (I believe) due to a coding issue there is a space added between each character in the output file. Example of a result:

2 0 1 3 - 1 2 - 2 3 ; ; 3 6 0 . 3 7 
2 0 1 3 - 1 2 - 2 4 ; ; 0 . 0 0 

Lines in the source file:

2013-12-23;;360.37
2013-12-24;;0.00

If I save the txt source file as ANSI before running the script, I receive the correct results. However, as the source file is delivered automatically as Unicode by another software, it is not practical to change that every time manually. I read through a lot of other coding/encoding/decoding questions. But I am completely lost and don't know how I can fix that issue. Which is the correct command? At which place in the script? Or am I completely wrong and it doesn't have anything to do with a coding issue?


回答1:


I'm fairly certain that your input file is UTF-16 encoded, and the spaces you're seeing are actually null bytes.

Try

with open("myfile.txt", "r", encoding="utf-16") as infile:
    lines = infile.readlines()

and see if the problem persists.



来源:https://stackoverflow.com/questions/20249832/python-unicode-source-file-adds-spaces-actually-null-bytes-between-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!