Assuming the following for...
Output:
The file is opened...
Data is \'streamed\' to disk. The data in memory is in a large contiguous buffer. It is
Are there generally accepted guidelines for achieving the fastest possible sequential file I/O in C++?
Rule 0: Measure. Use all available profiling tools and get to know them. It's almost a commandment in programming that if you didn't measure it you don't know how fast it is, and for I/O this is even more true. Make sure to test under actual work conditions if you possibly can. A process that has no competition for the I/O system can be over-optimized, fine-tuned for conditions that don't exist under real loads.
Use mapped memory instead of writing to files. This isn't always faster but it allows the opportunity to optimize the I/O in an operating system-specific but relatively portable way, by avoiding unnecessary copying, and taking advantage of the OS's knowledge of how the disk actually being used. ("Portable" if you use a wrapper, not an OS-specific API call).
Try and linearize your output as much as possible. Having to jump around memory to find the buffers to write can have noticeable effects under optimized conditions, because cache lines, paging and other memory subsystem issues will start to matter. If you have lots of buffers look into support for scatter-gather I/O which tries to do that linearizing for you.
Some possible considerations:
- Guidelines for choosing the optimal buffer size
Page size for starters, but be ready to tune from there.
- Will a portable library like boost::asio be too abstracted to expose the intricacies of a specific platform, or can they be assumed to be optimal?
Don't assume it's optimal. It depends on how thoroughly the library gets exercised on your platform, and how much effort the developers put into making it fast. Having said that a portable I/O library can be very fast, because fast abstractions exist on most systems, and it's usually possible to come up with a general API that covers a lot of the bases. Boost.Asio is, to the best of my limited knowledge, fairly fine tuned for the particular platform it is on: there's a whole family of OS and OS-variant specific APIs for fast async I/O (e.g. epoll, /dev/epoll, kqueue, Windows overlapped I/O), and Asio wraps them all.
- Is asynchronous I/O always preferable to synchronous? What if the application is not otherwise CPU-bound?
Asynchronous I/O isn't faster in a raw sense than synchronous I/O. What asynchronous I/O does is ensure that your code is not wasting time waiting for the I/O to complete. It is faster in a general way than the other method of not wasting that time, namely using threads, because it will call back into your code when I/O is ready and not before. There are no false starts or concerns with idle threads needing to be terminated.
As you noted above it all depends on the machine / system / libraries that you are using. A fast solution on one system may be slow on another.
A general guideline though would be to write in as large of chunks as possible.
Typically writing a byte at a time is the slowest.
The best way to know for sure is to code a few different ways and profile them.
A general advice is to turn off buffering and read/write in large chunks (but not too large, then you will waste too much time waiting for the whole I/O to complete where otherwise you could start munching away at the first megabyte already. It's trivial to find the sweet spot with this algorithm, there's only one knob to turn: the chunk size).
Beyond that, for input mmap()
ing the file shared and read-only is (if not the fastest, then) the most efficient way. Call madvise()
if your platform has it, to tell the kernel how you will traverse the file, so it can do readahead and throw out the pages afterwards again quickly.
For output, if you already have a buffer, consider underpinning it with a file (also with mmap()
), so you don't have to copy the data in userspace.
If mmap()
is not to your liking, then there's fadvise()
, and, for the really tough ones, async file I/O.
(All of the above is POSIX, Windows names may be different).
On Linux, buffered reads and writes speed up things a lot up, increasingly with increasing buffers sizes, but the returns are diminishing and you generally want to use BUFSIZ
(defined by stdio.h
) as larger buffer sizes won't help much.
mmap
ing provides the fastest access to files, but the mmap
call itself is rather expensive. For small files (16KiB) read
and write
system calls win (see https://stackoverflow.com/a/39196499/1084774 for the numbers on reading through read
and mmap
).
You will get the absolute fastest performance by using CreateFile
and ReadFile
. Open the file with FILE_FLAG_SEQUENTIAL_SCAN
.
Read with a buffer size that is a power of two. Only benchmarking can determine this number. I have seen it to be 8K once. Another time I found it to be 8M! This varies wildly.
It depends on the size of the CPU cache, on the efficiency of OS read-ahead and on the overhead associated with doing many small writes.
Memory mapping is not the fastest way. It has more overhead because you can't control the block size and the OS needs to fault in all pages.
For Windows, you'll want to make sure you use the FILE_FLAG_SEQUENTIAL_SCAN in your CreateFile() call, if you opt to use the platform specific Windows API call. This will optimize caching for the I/O. As far as buffer sizes go, a buffer size that is a multiple of the disk sector size is typically advised. 8K is a nice starting point with little to be gained from going larger.
This article discusses the comparison between async and sync on Windows.
http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx