Threadsafe and fault-tolerant file writes

后端 未结 4 1784
傲寒
傲寒 2020-12-06 06:57

I have a long-running process which writes a lot of stuff in a file. The result should be everything or nothing, so I\'m writing to a temporary file and rename it to the rea

相关标签:
4条回答
  • 2020-12-06 07:11

    You could use the lockfile module to lock the file while you are writing to it. Any subsequent attempt to lock it will block until the lock from the previous process/thread has been released.

    from lockfile import FileLock
    with FileLock(filename):
        #open your file here....
    

    This way, you circumvent your concurrency issues and do not have to clean up any leftover file if an exception occurs.

    0 讨论(0)
  • 2020-12-06 07:13

    To write all or nothing to a file reliably:

    import os
    from contextlib import contextmanager
    from tempfile   import NamedTemporaryFile
    
    if not hasattr(os, 'replace'):
        os.replace = os.rename #NOTE: it won't work for existing files on Windows
    
    @contextmanager
    def FaultTolerantFile(name):
        dirpath, filename = os.path.split(name)
        # use the same dir for os.rename() to work
        with NamedTemporaryFile(dir=dirpath, prefix=filename, suffix='.tmp') as f:
            yield f
            f.flush()   # libc -> OS
            os.fsync(f) # OS -> disc (note: on OSX it is not enough)
            f.delete = False # don't delete tmp file if `replace()` fails
            f.close()
            os.replace(f.name, name)
    

    See also Is rename() without fsync() safe? (mentioned by @Mihai Stan)

    Usage

    with FaultTolerantFile('very_important_file') as file:
        file.write('either all ')
        file.write('or nothing is written')
    

    To implement missing os.replace() you could call MoveFileExW(src, dst, MOVEFILE_REPLACE_EXISTING) (via win32file or ctypes modules) on Windows.

    In case of multiple threads you could call queue.put(data) from different threads and write to file in a dedicated thread:

     for data in iter(queue.get, None):
         file.write(data)
    

    queue.put(None) breaks the loop.

    As an alternative you could use locks (threading, multiprocessing, filelock) to synchronize access:

    def write(self, data):
        with self.lock:
            self.file.write(data)
    
    0 讨论(0)
  • 2020-12-06 07:22

    The with construct is useful for cleaning up on exit, but not for the commit/rollback system you want. A try/except/else block can be used for that.

    You also should use a standard way for creating the temporary file name, for example with the tempfile module.

    And remember to fsync before rename

    Below is the full modified code:

    import time, os, tempfile
    
    def begin_file(filepath):
        (filedir, filename) = os.path.split(filepath)
        tmpfilepath = tempfile.mktemp(prefix=filename+'_', dir=filedir)
        return open(os.path.join(filedir, tmpfilepath), 'wb') 
    
    def commit_file(f):
        tmppath = f.name
        (filedir, tmpname) = os.path.split(tmppath)
        origpath = os.path.join(filedir,tmpname.split('_')[0])
    
        os.fsync(f.fileno())
        f.close()
    
        if os.path.exists(origpath):
            os.unlink(origpath)
        os.rename(tmppath, origpath)
    
    def rollback_file(f):
        tmppath = f.name
        f.close()
        os.unlink(tmppath)
    
    
    fp = begin_file('whatever')
    try:
        fp.write('stuff')
    except:
        rollback_file(fp)
        raise
    else:
        commit_file(fp)
    
    0 讨论(0)
  • 2020-12-06 07:30

    You can use Python's tempfile module to give you a temporary file name. It can create a temporary file in a thread safe manner rather than making one up using time.time() which may return the same name if used in multiple threads at the same time.

    As suggested in a comment to your question, this can be coupled with the use of a context manager. You can get some ideas of how to implement what you want to do by looking at Python tempfile.py sources.

    The following code snippet may do what you want. It uses some of the internals of the objects returned from tempfile.

    • Creation of temporary files is thread safe.
    • Renaming of files upon successful completion is atomic, at least on Linux. There isn't a separate check between os.path.exists() and the os.rename() which could introduce a race condition. For an atomic rename on Linux the source and destinations must be on the same file system which is why this code places the temporary file in the same directory as the destination file.
    • The RenamedTemporaryFile class should behave like a NamedTemporaryFile for most purposes except when it is closed using the context manager, the file is renamed.

    Sample:

    import tempfile
    import os
    
    class RenamedTemporaryFile(object):
        """
        A temporary file object which will be renamed to the specified
        path on exit.
        """
        def __init__(self, final_path, **kwargs):
            tmpfile_dir = kwargs.pop('dir', None)
    
            # Put temporary file in the same directory as the location for the
            # final file so that an atomic move into place can occur.
    
            if tmpfile_dir is None:
                tmpfile_dir = os.path.dirname(final_path)
    
            self.tmpfile = tempfile.NamedTemporaryFile(dir=tmpfile_dir, **kwargs)
            self.final_path = final_path
    
        def __getattr__(self, attr):
            """
            Delegate attribute access to the underlying temporary file object.
            """
            return getattr(self.tmpfile, attr)
    
        def __enter__(self):
            self.tmpfile.__enter__()
            return self
    
        def __exit__(self, exc_type, exc_val, exc_tb):
            if exc_type is None:
                self.tmpfile.delete = False
                result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)
                os.rename(self.tmpfile.name, self.final_path)
            else:
                result = self.tmpfile.__exit__(exc_type, exc_val, exc_tb)
    
            return result
    

    You can then use it like this:

    with RenamedTemporaryFile('whatever') as f:
        f.write('stuff')
    

    During writing, the contents go to a temporary file, on exit the file is renamed. This code will probably need some tweaks but the general idea should help you get started.

    0 讨论(0)
提交回复
热议问题