How can I open multiple files (number of files unknown beforehand) using “with open” statement?

本秂侑毒 提交于 2019-12-29 01:28:10

问题


I specifically need to use with open statement for opening the files, because I need to open a few hundred files together and merge them using K-way merge. I understand, ideally I should have kept K low, but I did not foresee this problem.

Starting from scratch is not an option now as I have a deadline to meet. So at this point, I need very fast I/O that does not store the whole/huge portion of file in memory (because there are hundreds of files, each of ~10MB). I just need to read one line at a time for K-way merge. Reducing memory usage is my primary focus right now.

I learned that with open is the most efficient technique, but I cannot understand how to open all the files together in a single with open statement. Excuse my beginner ignorance!

Update: This problem was solved. It turns out the issue was not about how I was opening the files at all. I found out that the excessive memory usage was due to inefficient garbage collection. I did not use with open at all. I used the regular f=open() and f.close(). Garbage collection saved the day.


回答1:


It's fairly easy to write your own context manager to handle this by using the built-in contextmanger function decorator to define "a factory function for with statement context managers" as the documentation states. For example:

from contextlib import contextmanager

@contextmanager
def multi_file_manager(files, mode='rt'):
    """ Open multiple files and make sure they all get closed. """
    files = [open(file, mode) for file in files]
    yield files
    for file in files:
        file.close()

filenames = 'file1', 'file2', 'file3'

with multi_file_manager(filenames) as files:
    a = files[0].readline()
    b = files[2].readline()
        ...



回答2:


with open(...) as f: 
    # do stuff 

translates roughly to

f = open(...)
# do stuff
f.close()

In your case, I wouldn't use the with open syntax. If you have a list of filenames, then do something like this

filenames = os.listdir(file_directory)
open_files = map(open, filenames)
# do stuff
for f in open_files:
    f.close()

If you really want to use the with open syntax, you can make your own context manager that accepts a list of filenames

class MultipleFileManager(object):
    def __init__(self, files):
        self.files = files

    def __enter__(self):
        self.open_files = map(open, self.files)
        return self.open_files

    def __exit__(self):
        for f in self.open_files:
            f.close()

And then use it like this:

filenames = os.listdir(file_directory)
with MulitpleFileManager(filenames) as files:
    for f in files:
        # do stuff

The only advantage I see to using a context manager in this case is that you can't forget to close the files. But there is nothing wrong with manually closing the files. And remember, the os will reclaim its resources when your program exits anyway.




回答3:


While not a solution for 2.7, I should note there is one good, correct solution for 3.3+, contextlib.ExitStack, which can be used to do this correctly (surprisingly difficult to get right when you roll your own) and nicely:

from contextlib import ExitStack

with open('source_dataset.txt') as src_file, ExitStack() as stack:
    files = [stack.enter_context(open(fname, 'w')) for fname in fname_list]
    ... do stuff with src_file and the values in files ...
... src_file and all elements in stack cleaned up on block exit ...

Importantly, if any of the opens fails, all of the opens that succeeded prior to that point will be cleaned up deterministically; most naive solutions end up failing to clean up in that case, relying on the garbage collector at best, and in cases like lock acquisition where there is no object to collect, failing to ever release the lock.

Posted here since this question was marked as the "original" for a duplicate that didn't specify Python version.



来源:https://stackoverflow.com/questions/21680473/how-can-i-open-multiple-files-number-of-files-unknown-beforehand-using-with-o

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!