Split large files using python

前端 未结 5 1095
有刺的猬
有刺的猬 2020-12-28 21:25

I have some trouble trying to split large files (say, around 10GB). The basic idea is simply read the lines, and group every, say 40000 lines into one file. But there are tw

5条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-28 21:49

    If there's nothing special about having a specific number of file lines in each file, the readlines() function also accepts a size 'hint' parameter that behaves like this:

    If given an optional parameter sizehint, it reads that many bytes from the file and enough more to complete a line, and returns the lines from that. This is often used to allow efficient reading of a large file by lines, but without having to load the entire file in memory. Only complete lines will be returned.

    ...so you could write that code something like this:

    # assume that an average line is about 80 chars long, and that we want about 
    # 40K in each file.
    
    SIZE_HINT = 80 * 40000
    
    fileNumber = 0
    with open("inputFile.txt", "rt") as f:
       while True:
          buf = f.readlines(SIZE_HINT)
          if not buf:
             # we've read the entire file in, so we're done.
             break
          outFile = open("outFile%d.txt" % fileNumber, "wt")
          outFile.write(buf)
          outFile.close()
          fileNumber += 1 
    

提交回复
热议问题