Consider an application that is CPU bound, but also has high-performance I/O requirements.
I'm comparing Linux file I/O to Windows, and I can't see how epoll will help a Linux program at all. The kernel will tell me that the file descriptor is "ready for reading," but I still have to call blocking read() to get my data, and if I want to read megabytes, it's pretty clear that that will block.
On Windows, I can create a file handle with OVERLAPPED set, and then use non-blocking I/O, and get notified when the I/O completes, and use the data from that completion function. I need to spend no application-level wall-clock time waiting for data, which means I can precisely tune my number of threads to my number of cores, and get 100% efficient CPU utilization.
If I have to emulate asynchronous I/O on Linux, then I have to allocate some number of threads to do this, and those threads will spend a little bit of time doing CPU things, and a lot of time blocking for I/O, plus there will be overhead in the messaging to/from those threads. Thus, I will either over-subscribe or under-utilize my CPU cores.
I looked at mmap() + madvise() (WILLNEED) as a "poor man's async I/O" but it still doesn't get all the way there, because I can't get a notification when it's done -- I have to "guess" and if I guess "wrong" I will end up blocking on memory access, waiting for data to come from disk.
Linux seems to have the starts of async I/O in io_submit, and it seems to also have a user-space POSIX aio implementation, but it's been that way for a while, and I know of nobody who would vouch for these systems for critical, high-performance applications.
The Windows model works roughly like this:
- Issue an asynchronous operation.
- Tie the asynchronous operation to a particular I/O completion port.
- Wait on operations to complete on that port
- When the I/O is complete, the thread waiting on the port unblocks, and returns a reference to the pending I/O operation.
Steps 1/2 are typically done as a single thing. Steps 3/4 are typically done with a pool of worker threads, not (necessarily) the same thread as issues the I/O. This model is somewhat similar to the model provided by boost::asio, except boost::asio doesn't actually give you asynchronous block-based (disk) I/O.
The difference to epoll in Linux is that in step 4, no I/O has yet happened -- it hoists step 1 to come after step 4, which is "backwards" if you know exactly what you need already.
Having programmed a large number of embedded, desktop, and server operating systems, I can say that this model of asynchronous I/O is very natural for certain kinds of programs. It is also very high-throughput and low-overhead. I think this is one of the remaining real shortcomings of the Linux I/O model, at the API level.
The real answer, which was indirectly pointed to by Peter Teoh, is based on io_setup() and io_submit(). Specifically, the "aio_" functions indicated by Peter are part of the glibc user-level emulation based on threads, which is not an efficient implementation. The real answer is in:
io_submit(2)
io_setup(2)
io_cancel(2)
io_destroy(2)
io_getevents(2)
Note that the man page, dated 2012-08, says that this implementation has not yet matured to the point where it can replace the glibc user-space emulation:
http://man7.org/linux/man-pages/man7/aio.7.html
this implementation hasn't yet matured to the point where the POSIX AIO implementation can be completely reimplemented using the kernel system calls.
So, according to the latest kernel documentation I can find, Linux does not yet have a mature, kernel-based asynchronous I/O model. And, if I assume that the documented model is actually mature, it still doesn't support partial I/O in the sense of recv() vs read().
As explained in:
http://code.google.com/p/kernel/wiki/AIOUserGuide
and here:
http://www.ibm.com/developerworks/library/l-async/
Linux does provide async block I/O at the kernel level, APIs as follows:
aio_read Request an asynchronous read operation
aio_error Check the status of an asynchronous request
aio_return Get the return status of a completed asynchronous request
aio_write Request an asynchronous operation
aio_suspend Suspend the calling process until one or more asynchronous requests have completed (or failed)
aio_cancel Cancel an asynchronous I/O request
lio_listio Initiate a list of I/O operations
And if you asked who are the users of these API, it is the kernel itself - just a small subset is shown here:
./drivers/net/tun.c (for network tunnelling):
static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
./drivers/usb/gadget/inode.c:
ep_aio_read(struct kiocb *iocb, const struct iovec *iov,
./net/socket.c (general socket programming):
static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
./mm/filemap.c (mmap of files):
generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
./mm/shmem.c:
static ssize_t shmem_file_aio_read(struct kiocb *iocb,
etc.
At the userspace level, there is also the io_submit() etc API (from glibc), but the following article offer an alternative to using glibc:
http://www.fsl.cs.sunysb.edu/~vass/linux-aio.txt
It directly implement the API for functions like io_setup() as direct syscall (bypassing glibc dependencies), a kernel mapping via the same "__NR_io_setup" signature should exist. Upon searching the kernel source at:
http://lxr.free-electrons.com/source/include/linux/syscalls.h#L474 (URL is applicable for the latest version 3.13) you are greeted with the direct implementation of these io_*() API in the kernel:
474 asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx);
475 asmlinkage long sys_io_destroy(aio_context_t ctx);
476 asmlinkage long sys_io_getevents(aio_context_t ctx_id,
481 asmlinkage long sys_io_submit(aio_context_t, long,
483 asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
The later version of glibc should make these usage of "syscall()" to call sys_io_setup() unnecessary, but without the latest version of glibc, you can always make these call yourself if you are using the later kernel with these capabilities of "sys_io_setup()".
Of course, there are other userspace option for asynchronous I/O (eg, using signals?):
http://personal.denison.edu/~bressoud/cs375-s13/supplements/linux_altIO.pdf
or perhap:
What is the status of POSIX asynchronous I/O (AIO)?
"io_submit" and friends are still not available in glibc (see io_submit manpages), which I have verified in my Ubuntu 14.04, but this API is linux-specific.
Others like libuv, libev, and libevent are also asynchronous API:
http://nikhilm.github.io/uvbook/filesystem.html#reading-writing-files
http://software.schmorp.de/pkg/libev.html
All these API aimed to be portable across BSD, Linux, MacOSX, and even Windows.
In terms of performance I have not seen any numbers, but suspect libuv may be the fastest, due to its lightweightedness?
(2019) If you're using a 5.1 or above kernel you can use the io_uring
interface for file-like I/O and get excellent asynchronous operation.
Compared to the existing libaio
/KAIO interface io_uring
has the following advantages:
- Works with buffered AND direct I/O
- Easier to use
- Can optionally work in a polled manner
- Less bookkeeping space overhead per I/O
- Lower CPU overhead due to less userspace/kernel syscall context switches (a big deal these days due to the impact of spectre/meltdown mitigations)
- Doesn't become blocking each time the stars aren't perfectly aligned
Compared to glibc's POSIX aio io_uring
has the following advantages:
- Much faster and more efficient (the lower overhead benefits from above apply even moreso here)
- Interface is kernel backed and DOESN'T use a userspace thread pool
- glibc's POSIX aio can't have more than one I/O in flight on a single file descriptor whereas
io_uring
most certainly can!
The "Efficient IO with io_uring" document goes into far more detail as to io_uring
's benefits and usage.
I'm not quite sure "support partial I/O in the sense of recv()
vs read()
" makes so much sense for file-based I/O. Ideally you're asking for read I/O in sizes the disk can actually do so the buffer is either completely ready or not ready at all (or the benefit of doing re-assembly yourself isn't worth the effort because the disk is so fast).
Obviously at the time of writing the io_uring
interface is very new but hopefully it will usher in a better asynchronous file-based I/O story for Linux.
来源:https://stackoverflow.com/questions/13407542/is-there-really-no-asynchronous-block-i-o-on-linux