How can I read large text files in Python, line by line, without loading it into memory?

前端 未结 15 1277
臣服心动
臣服心动 2020-11-22 03:32

I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() bec

15条回答
  •  粉色の甜心
    2020-11-22 04:01

    How about this? Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.

    Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

    def chunks(file,size=1024):
        while 1:
    
            startat=fh.tell()
            print startat #file's object current position from the start
            fh.seek(size,1) #offset from current postion -->1
            data=fh.readline()
            yield startat,fh.tell()-startat #doesnt store whole list in memory
            if not data:
                break
    if os.path.isfile(fname):
        try:
            fh=open(fname,'rb') 
        except IOError as e: #file --> permission denied
            print "I/O error({0}): {1}".format(e.errno, e.strerror)
        except Exception as e1: #handle other exceptions such as attribute errors
            print "Unexpected error: {0}".format(e1)
        for ele in chunks(fh):
            fh.seek(ele[0])#startat
            data=fh.read(ele[1])#endat
            print data
    

提交回复
热议问题