After providing the same program which reads a random generated input file and echoes the same string it read to an output. The only difference is that on one side I\'m providin
Timing my application with an input of 10Mb in size and echoing it to /dev/null, and making sure the file in not cached, I've found that libc's frwite is faster by a LARGE scale when using very small buffers (1 byte in case).
fwrite
works on streams, which are buffered. Therefore many small buffers will be faster because it won't run a costly system call until the buffer fills up (or you flush it or close the stream). On the other hand, small buffers being sent to write
will run a costly system call for each buffer - that's where you're losing the speed. With a 1024 byte stream buffer, and writing 1 byte buffers, you're looking at 1024 write
calls for each kilobyte, rather than 1024 fwrite
calls turning into one write
- see the difference?
For big buffers the difference will be small, because there will be less buffering, and therefore a more consistent number of system calls between fwrite
and write
.
In other words, fwrite(3)
is just a library routine that collects up output into chunks, and then calls write(2)
. Now, write(2)
, is a system call which traps into the kernel. That's where the I/O actually happens. There is some overhead for simply calling into the kernel, and then there is the time it takes to actually write something. If you use large buffers, you will find that write(2)
is faster because it eventually has to be called anyway, and if you are writing one or more times per fwrite then the fwrite buffering overhead is just that: more overhead.
If you want to read more about it, you can have a look at this document, which explains standard I/O streams.