Python file open/close every time vs keeping it open until the process is finished

后端 未结 4 1640
暗喜
暗喜 2020-12-24 11:29

I have about 50 GB of text file and I am checking the first few characters each line and writing those to other files specified for that beginning text.

For example.

相关标签:
4条回答
  • 2020-12-24 11:50

    You should definitely try to open/close the file as little as possible

    Because even comparing with file read/write, file open/close is far more expensive

    Consider two code blocks:

    f=open('test1.txt', 'w')
    for i in range(1000):
        f.write('\n')
    f.close()
    

    and

    for i in range(1000):
        f=open('test2.txt', 'a')
        f.write('\n')
        f.close()
    

    The first one takes 0.025s while the second one takes 0.309s

    0 讨论(0)
  • 2020-12-24 12:01

    IO operations consume too much time. Open and close the file, also.

    It's much faster if you open both files(the input and the output), use a memory buffer with, let's say, 10MB size for your text processing and then write this to the output file. For example:

    file = {} # just initializing dicts
    filename = {}
    with open(file) as f:
        file['dog'] = None
        buffer = ''
        ...
        #maybe there is a loop here
        if writeflag:
            if file['dog'] == None:
                file['dog'] = open(filename['dog'], 'a')
            buffer += remline + '\n'
        if len(buffer) > 1024*1000*10: # 10MB of text
           files['dog'].write(buffer)
           buffer = ''
    
    for v in files.values():
        v.close()
    
    0 讨论(0)
  • 2020-12-24 12:02

    Use the with statement, it automatically closes the files for you, do all the operations inside the with block, so it'll keep the files open for you and will close the files once you're out of the with block.

    with open(inputfile)as f1, open('dog.txt','a') as f2,open('cat.txt') as f3:
       #do something here
    

    EDIT: If you know all the possible filenames to be used before the compilation of your code then using with is a better option and if you don't then you should use your approach but instead of closing the file you can flush the data to the file using writefile1.flush()

    0 讨论(0)
  • 2020-12-24 12:12

    Keep it open the whole time! Otherwise you tell the system that you are done writing all the time and it might decide to flush it onto the disk instead of buffering it. And for obvious reasons n disk writes are much more expensive than 1 disk write.

    If you want to append to the file and not overwrite it then yes, a is the correct mode.

    0 讨论(0)
提交回复
热议问题