问题
I need to read chunks of 64KB in loop, and process them, but stop at the end of file minus 16 bytes: the last 16 bytes are a tag
metadata.
The file might be super large, so I can't read it all in RAM.
All the solutions I find are a bit clumsy and/or unpythonic.
with open('myfile', 'rb') as f:
while True:
block = f.read(65536)
if not block:
break
process_block(block)
If 16 <= len(block) < 65536
, it's easy: it's the last block ever. So useful_data = block[:-16]
and tag = block[-16:]
If len(block) == 65536
, it could mean three things: that the full block is useful data. Or that this 64KB block is in fact the last block, so useful_data = block[:-16]
and tag = block[-16:]
. Or that this 64KB block is followed by another block of only a few bytes (let's say 3 bytes), so in this case: useful_data = block[:-13]
and tag = block[-13:] + last_block[:3]
.
How to deal with this problem in a nicer way than distinguishing all these cases?
Note:
the solution should work for a file opened with
open(...)
, but also for aio.BytesIO()
object, or for a distant SFTP opened file (withpysftp
).I was thinking about getting the file object size, with
f.seek(0,2) length = f.tell() f.seek(0)
Then after each
block = f.read(65536)
we can know if we are far from the end with
length - f.tell()
, but again the full solution does not look very elegant.
回答1:
you can just read in every iteration min(65536, L-f.tell()-16)
Something like this:
from pathlib import Path
L = Path('myfile').stat().st_size
with open('myfile', 'rb') as f:
while True:
to_read_length = min(65536, L-f.tell()-16)
block = f.read(to_read_length)
process_block(block)
if f.tell() == L-16
break
Did not ran this, but hope you get the gist of it.
回答2:
The following method relies only on the fact that the f.read()
method returns an empty bytes object upon end of stream (EOS). It thus could be adopted for sockets simply by replacing f.read()
with s.recv()
.
def read_all_but_last16(f):
rand = random.Random() # just for testing
buf = b''
while True:
bytes_read = f.read(rand.randint(1, 40)) # just for testing
# bytes_read = f.read(65536)
buf += bytes_read
if not bytes_read:
break
process_block(buf[:-16])
buf = buf[-16:]
verify(buf[-16:])
It works by always leaving 16 bytes at the end of buf
until EOS, then finally processing the last 16. Note that if there aren't at least 17 bytes in buf
then buf[:-16]
returns the empty bytes object.
来源:https://stackoverflow.com/questions/64959048/read-blocks-from-a-file-object-until-x-bytes-from-the-end