问题
I have a large (21 GByte) file which I want to read into memory and then pass to a subroutine which processes the data transparently to me. I am on python 2.6.6 on Centos 6.5 so upgrading the operating system or python is not an option. Currently, I am using
f = open(image_filename, "rb")
image_file_contents=f.read()
f.close()
transparent_subroutine ( image_file_contents )
which is slow (~15 minutes). Before I start reading the file, I know how big the file is, because I call os.stat( image_filename ).st_size
so I could pre-allocate some memory if that made sense.
Thank you
回答1:
To follow Dietrich's suggestion, I measure this mmap technique is 20% faster than one big read for a 1.7GB input file
from zlib import adler32 as compute_cc
n_chunk = 1024**2
crc = 0
with open( fn ) as f:
mm = mmap.mmap( f.fileno(), 0, prot = mmap.PROT_READ, flags = mmap.MAP_PRIVATE )
while True:
buf = mm.read( n_chunk )
if not buf: break
crc = compute_crc( buf, crc )
return crc
来源:https://stackoverflow.com/questions/25754837/what-is-the-most-efficient-way-to-read-a-large-binary-file-python