unbuffered I/O in Linux

前端 未结 3 2208
忘掉有多难
忘掉有多难 2021-02-14 09:45

I\'m writing lots and lots of data that will not be read again for weeks - as my program runs the amount of free memory on the machine (displayed with \'free\' or \'top\') drops

相关标签:
3条回答
  • 2021-02-14 10:18

    The closest equivalent to the Windows flags you mention I can think of is to open your file with the open(2) flags O_DIRECT | O_SYNC:

       O_DIRECT (Since Linux 2.4.10)
              Try to minimize cache effects of the I/O to and from this file.  In
              general this will degrade performance, but it is useful in special
              situations, such as when applications do their own caching.  File I/O
              is done directly to/from user space buffers.  The O_DIRECT flag on its
              own makes at an effort to transfer data synchronously, but does not
              give the guarantees of the O_SYNC that data and necessary metadata are
              transferred.  To guarantee synchronous I/O the O_SYNC must be used in
              addition to O_DIRECT.  See NOTES below for further discussion.
    
              A semantically similar (but deprecated) interface for block devices is
              described in raw(8).
    

    Granted, trying to do research on this flag to confirm it's what you want I found this interesting piece telling you that unbuffered I/O is a bad idea, Linus describing it as "brain damaged". According to that you should be using madvise() instead to tell the kernel how to cache pages. YMMV.

    0 讨论(0)
  • 2021-02-14 10:21

    as my program runs the amount of free memory on the machine drops very quickly

    Why is this a problem? Free memory is memory that isn't serving any useful purpose. When it's used to cache data, at least there is a chance it will be useful.

    If one of your programs requests more memory, file caches will be the first thing to go. Linux knows that it can re-read that data from disk whenever it wants, so it will just reap the memory and give it a new use.

    It's true that Linux by default waits around 30 seconds (this is what the value used to be anyhow) before flushing writes to disk. You can speed this up with a call to fsync(). But once the data has been written to disk, there's practically zero cost to keeping a cache of the data in memory.

    Seeing as you write to the file and don't read from it, Linux will probably guess that this data is the best to throw out, in preference to other cached data. So don't waste effort trying to optimise unless you've confirmed that it's a performance problem.

    0 讨论(0)
  • 2021-02-14 10:34

    You can use O_DIRECT, but in that case you need to do the block IO yourself; you must write in multiples of the FS block size and on block boundaries (it is possible that it is not mandatory but if you do not its performance will suck x1000 because every unaligned write will need a read first).

    Another much less impacting way of stopping your blocks using up the OS cache without using O_DIRECT, is to use posix_fadvise(fd, offset,len, POSIX_FADV_DONTNEED). Under Linux 2.6 kernels which support it, this immediately discards (clean) blocks from the cache. Of course you need to use fdatasync() or such like first, otherwise the blocks may still be dirty and hence won't be cleared from the cache.

    It is probably a bad idea of fdatasync() and posix_fadvise( ... POSIX_FADV_DONTNEED) after every write, but instead wait until you've done a reasonable amount (50M, 100M maybe).

    So in short

    • after every (significant chunk) of writes,
    • Call fdatasync followed by posix_fadvise( ... POSIX_FADV_DONTNEED)
    • This will flush the data to disc and immediately remove them from the OS cache, leaving space for more important things.

    Some users have found that things like fast-growing log files can easily blow "more useful" stuff out of the disc cache, which reduces cache hits a lot on a box which needs to have a lot of read cache, but also writes logs quickly. This is the main motivation for this feature.

    However, like any optimisation

    a) You're not going to need it so

    b) Do not do it (yet)

    0 讨论(0)
提交回复
热议问题