buffered asynchronous file I/O on linux

后端 未结 3 787
花落未央
花落未央 2020-12-23 21:00

I am looking for the most efficient way to do asynchronous file I/O on linux.

The POSIX glibc implementation uses threads in userland.

The native aio kernel

相关标签:
3条回答
  • 2020-12-23 21:13

    I don't think the Linux kernel implementation of asynchronous file I/O is really usable unless you also use O_DIRECT, sorry.

    There's more information about the current state of the world here: https://github.com/littledan/linux-aio . It was updated in 2012 by someone who used to work at Google.

    0 讨论(0)
  • 2020-12-23 21:31

    The material seems old -- well, it is old -- because it's been around for long and, while by no means trivial, is well understood. A solution you can lift is published in W. Richard Stevens's superb and unparalleled book (read "bible"). The book is the rare treasure that is clear, concise, and complete: every page gives real and immediate value:

        Advanced Programming in the UNIX Environment

    Two other such, also by Stevens, are the first two volumes of his Unix Network Programming collection:

       Volume 1: The Sockets Networking API (with Fenner and Rudoff) and
       Volume 2: Interprocess Communications

    I can't imagine being without these three fundamental books; I'm dumbstruck when I find someone who hasn't heard of them.

    Still more of Steven's books, just as precious:

       TCP/IP Illustrated, Vol. 1: The Protocols

    0 讨论(0)
  • 2020-12-23 21:33

    Unless you want to write your own IO thread pool, the glibc implementation is an acceptable solution. It actually works surprisingly well for something that runs entirely in userland.

    The kernel implementation does not work with buffered IO at all in my experience (though I've seen other people say the opposite!). Which is fine if you want to read huge amounts of data via DMA, but of course it sucks big time if you plan to take advantage of the buffer cache.
    Also note that the kernel AIO calls may actually block. There is a limited size command buffer, and large reads are broken up into several smaller ones. Once the queue is full, asynchronous commands run synchronously. Surprise. I've run into this problem a year or two ago and could not find an explanation. Asking around gave me the "yeah of course, that's how it works" answer.
    From what I've understood, the "official" interest in supporting buffered aio is not terribly great either, despite several working solutions seem to be available for years. Some of the arguments that I've read were on the lines of "you don't want to use the buffers anyway" and "nobody needs that" and "most people don't even use epoll yet". So, well... meh.

    Being able to get an epoll signalled by a completed async operation was another issue until recently, but in the meantime this works really fine via eventfd.

    Note that the glibc implementation will actually spawn threads on demand inside __aio_enqueue_request. It is probably no big deal, since spawning threads is not that terribly expensive any more, but one should be aware of it. If your understanding of starting an asynchronous operation is "returns immediately", then that assumption may not be true, because it may be spawning some threads first.

    EDIT:
    As a sidenote, under Windows there exists a very similar situation to the one in the glibc AIO implementation where the "returns immediately" assumption of queuing an asynchronous operation is not true.
    If all data that you wanted to read is in the buffer cache, Windows will decide that it will instead run the request synchronously, because it will finish immediately anyway. This is well-documented, and admittedly sounds great, too. Except in case there are a few megabytes to copy or in case another thread has page faults or does IO concurrently (thus competing for the lock) "immediately" can be a surprisingly long time -- I've seen "immediate" times of 2-5 milliseconds. Which is no problem in most situations, but for example under the constraint of a 16.66ms frame time, you probably don't want to risk blocking for 5ms at random times. Thus, the naive assumption of "can do async IO from my render thread no problem, because async doesn't block" is flawed.

    0 讨论(0)
提交回复
热议问题