Are POSIX' read() and write() system calls atomic?

前端 未结 2 939
南方客
南方客 2020-12-06 14:32

I am trying to implement a database index based on the data structure (Blink tree) and algorithms suggested by Lehman and Yao in this paper. In page 2, the author

相关标签:
2条回答
  • 2020-12-06 14:55

    Answers:

    1. Concurrent reads to writes may see torn writes depending on OS, filing system, and what flags you opened the file with. A quick summary by flags, OS and filing system is below.

    2. You can lock byte ranges in a file before accessing them using fcntl() on POSIX or LockFile() on Windows.


    No O_DIRECT/FILE_FLAG_NO_BUFFERING:

    Microsoft Windows 10 with NTFS: update atomicity = 1 byte

    Linux 4.2.6 with ext4: update atomicity = 1 byte

    FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)

    O_DIRECT/FILE_FLAG_NO_BUFFERING:

    Microsoft Windows 10 with NTFS: update atomicity = up to 4096 bytes only if page aligned, otherwise 512 bytes if FILE_FLAG_WRITE_THROUGH off, else 64 bytes. Note that this atomicity is probably a feature of PCIe DMA rather than designed in (*).

    Linux 4.2.6 with ext4: update atomicity = at least 1Mb, probably infinite (*). Note that earlier Linuxes with ext4 definitely did not exceed 4096 bytes, XFS certainly used to have custom locking but it looks like recent Linux has finally fixed this.

    FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)


    You can see the raw empirical test results at https://github.com/BoostGSoC13/boost.afio/blob/master/fs_probe/fs_probe_results.yaml. The results were generated by a program written using asynchronous file i/o through on all platforms. Note we test for torn offsets only on 512 byte multiples, so I cannot say if a partial update of a 512 byte sector would tear during the read-modify-write cycle.

    0 讨论(0)
  • 2020-12-06 14:58

    I don't believe the text you cites implies anything of the sort. It doesn't even mention read() or write() or POSIX. In fact, read() and write() cannot be relied on to be atomic. The only thing POSIX says is that write() must be atomic if the size of the write is less than PIPE_BUF bytes, and even that only applies to pipes.

    I didn't read the context around the part of the paper you cited, but it sounds like the passage you cited is stating constraints which must be placed on an implementation in order for the algorithm to work correctly. In other words, it states that an implementation of this algorithm requires locking.

    How you do that locking is up to you (the implementor). If we are dealing with a regular file and multiple independent processes, you might try fcntl(F_SETLKW)-style locking. If your data structure is in memory and you are dealing with multiple threads in the same process, something else might be appropriate.

    0 讨论(0)
提交回复
热议问题