Ignore newline character in binary file with Python?

Deadly 提交于 2019-12-12 05:35:30

问题


I open my file like so :

f = open("filename.ext", "rb") # ensure binary reading with b

My first line of data looks like this (when using f.readline()):

'\x04\x00\x00\x00\x12\x00\x00\x00\x04\x00\x00\x00\xb4\x00\x00\x00\x01\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x18\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00:\x00\x00\x00;\x00\x00\x00<\x00\x00\x007\x00\x00\x008\x00\x00\x009\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n'

Thing is, I want to read this data byte by byte (f.read(4)). While debugging, I realized that when it gets to the end of the first line, it still takes in the newline character \n and it is used as the first byte of the following int I read. I don't want to simply use .splitlines()because some data could have an n inside and I don't want to corrupt it. I'm using Python 2.7.10, by the way. I also read that opening a binary file with the b parameter "takes care" of the new line/end of line characters; why is not the case with me?

This is what happens in the console as the file's position is right before the newline character:

>>> d = f.read(4)
>>> d
'\n\x00\x00\x00'
>>> s = struct.unpack("i", d)
>>> s
(10,)

回答1:


(Followed from discussion with OP in chat)

Seems like the file is in binary format and the newlines are just mis-interpreted values. This can happen when writing 10 to the file for example.

This doesn't mean that newline was intended, and it is probably not. You can just ignore it being printed as \n and just use it as data.




回答2:


You should just be able to replace the bytes that indicate it is a newline.

>>> d = f.read(4).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
>>> diff = 4 - len(d)
>>> while diff > 0: # You can probably make this more sophisticated
...     d += f.read(diff).replace(b'\x0d\x0a', b'') #\r\n should be bytes b'\x0d\x0a'
...     diff = 4 - len(d)
>>> 
>>> s = struct.unpack("i", d)

This should give you an idea of how it will work. This approach could mess with your data's byte alignment.

If you really are seeing "\n" in your print of d then try .replace(b"\n", b"")



来源:https://stackoverflow.com/questions/36896278/ignore-newline-character-in-binary-file-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!