A safe, atomic file-copy operation

后端 未结 2 1353
忘了有多久
忘了有多久 2021-02-05 12:24

I need to copy a file from one location to another, and I need to throw an exception (or at least somehow recognise) if the file already exists at the destination (no overwritin

相关标签:
2条回答
  • 2021-02-05 12:49

    There is no way to do this; file copy operations are never atomic and there is no way to make them.

    But you can write the file under a random, temporary name and then rename it. Rename operations have to be atomic. If the file already exists, the rename will fail and you'll get an error.

    [EDIT2] rename() is only atomic if you do it in the same file system. The safe way is to create the new file in the same folder as the destination.

    [EDIT] There is a lot of discussion whether rename is always atomic or not and about the overwrite behavior. So I dug up some resources.

    On Linux, if the destination exists and both source and destination are files, then the destination is silently overwritten (man page). So I was wrong there.

    But rename(2) still guarantees that either the original file or the new file remain valid if something goes wrong, so the operation is atomic in the sense that it can't corrupt data. It's not atomic in the sense that it prevents two processes from doing the same rename at the same time and you can predict the result. One will win but you can't tell which.

    On Windows, if another process is currently writing the file, you get an error if you try to open it for writing, so one advantage for Windows, here.

    If your computer crashes while the operation is written to disk, the implementation of the file system will decide how much data gets corrupted. There is nothing an application could do about this. So stop whining already :-)

    There is also no other approach that works better or even just as well as this one.

    You could use file locking instead. But that would just make everything more complex and yield no additional advantages (besides being more complicated which some people do see as a huge advantage for some reason). And you'd add a lot of nice corner cases when your file is on a network drive.

    You could use open(2) with the mode O_CREAT which would make the function fail if the file already exists. But that wouldn't prevent a second process to delete the file and writing their own copy.

    Or you could create a lock directory since creating directories has to be atomic as well. But that would not buy you much, either. You'd have to write the locking code yourself and make absolutely, 100% sure that you really, really always delete the lock directory in case of disaster - which you can't.

    0 讨论(0)
  • 2021-02-05 12:51

    There is in fact a way to do this, atomically and safely, provided all actors do it the same way. It's an adaptation of the lock-free whack-a-mole algorithm, and not entirely trivial, so feel free to go with "no" as the general answer ;)

    What to do

    1. Check whether the file already exists. Stop if it does.
    2. Generate a unique ID
    3. Copy the source file to the target folder with a temporary name, say, <target>.<UUID>.tmp.
    4. Rename the copy <target>-<UUID>.mole.tmp.
    5. Look for any other files matching the pattern <target>-*.mole.tmp.
      • If their UUID compares greater than yours, attempt to delete it. (Don't worry if it's gone.)
      • If their UUID compares less than yours, attempt to delete your own. (Again, don't worry if it's gone.) From now on, treat their UUID as if it were your own.
    6. Check again to see if the destination file already exists. If so, attempt to delete your temporary file. (Don't worry if it's gone. Remember your UUID may have changed in step 5.)
    7. If you didn't already attempt to delete it in step 6, attempt to rename your temporary file to its final name, <target>. (Don't worry if it's gone, just jump back to step 5.)

    You're done!

    How it works

    Imagine each candidate source file is a mole coming out of its hole. Half-way out, it pauses and whacks any competing moles back into the ground, before checking no other mole has fully emerged. If you run this through in your head, you should see that only one mole will ever make it all the way out. To prevent this system from livelocking, we add a total ordering on which mole can whack which. Bam! A  PhD thesis  lock-free algorithm.

    Step 4 may look unnecessary—why not just use that name in the first place? However, another process may "adopt" your  mole  file in step 5, and make it the winner in step 7, so it's very important that you're not still writing out the contents! Renames on the same file system are atomic, so step 4 is safe.

    0 讨论(0)
提交回复
热议问题