问题
I am opening up an extremely large binary file I am opening in Python 3.5 in file1.py
:
with open(pathname, 'rb') as file:
for i, line in enumerate(file):
# parsing here
However, I naturally get an error because I am reading the file in binary mode and then creating a list of bytes. Then with a for loop, you are comparing string to bytes and here the code fails.
If I was reading in individual lines, I would do this:
with open(fname, 'rb') as f:
lines = [x.decode('utf8').strip() for x in f.readlines()]
However, I am using for index, lines in enumerate(file):
. What is the correct approach in this case? Do I decode the next objects?
Here is the actual code I am running:
with open(bam_path, 'rb') as file:
for i, line in enumerate(file):
line_data=pd.DataFrame({k.strip():v.strip()
for k,_,v in (e.partition(':')
for e in line.split('\t'))}, index=[i])
And here is the error:
Traceback (most recent call last):
File "file1.py", line 18, in <module>
for e in line.split('\t'))}, index=[i])
TypeError: a bytes-like object is required, not 'str'
回答1:
You could feed a generator with the decoded lines to enumerate
:
for i, line in enumerate(l.decode(errors='ignore') for l in f):
Which does the trick of yielding every line in f
after decoding it. I've added errors='ignore'
due to the fact that opening with r
failed with an unknown start byte.
As an aside, you could just replace all string literals with byte literals when operating on bytes
, i.e: partition(b':')
, split(b'\t')
and do your work using bytes
(pretty sure pandas works fine with them).
来源:https://stackoverflow.com/questions/39861431/how-to-decode-binary-file-with-for-index-line-in-enumeratefile