What is the most efficient way to read a large binary file python

你离开我真会死。 提交于 2019-12-13 00:38:53

问题


I have a large (21 GByte) file which I want to read into memory and then pass to a subroutine which processes the data transparently to me. I am on python 2.6.6 on Centos 6.5 so upgrading the operating system or python is not an option. Currently, I am using

f = open(image_filename, "rb")
image_file_contents=f.read()
f.close()
transparent_subroutine ( image_file_contents )

which is slow (~15 minutes). Before I start reading the file, I know how big the file is, because I call os.stat( image_filename ).st_size

so I could pre-allocate some memory if that made sense.

Thank you


回答1:


To follow Dietrich's suggestion, I measure this mmap technique is 20% faster than one big read for a 1.7GB input file

from zlib import adler32 as compute_cc

n_chunk = 1024**2
crc = 0
with open( fn ) as f:
  mm = mmap.mmap( f.fileno(), 0, prot = mmap.PROT_READ, flags = mmap.MAP_PRIVATE )
  while True:
    buf = mm.read( n_chunk )
    if not buf: break
    crc = compute_crc( buf, crc )
return crc


来源:https://stackoverflow.com/questions/25754837/what-is-the-most-efficient-way-to-read-a-large-binary-file-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!