I have the following question regarding C file I/O.
At a physical level (harddrive), is it valid to assume that every fread(n_blocks, size, length,FILE fp)
From the point view of an application programmer, the exact process of reading the blocks is indeterministic. It all goes down to the disk scheduler that organizes the access operations of multiple requests at the same time from multiple processes. There are multiple algorithms to solve this issue, but thinking too simplistic(1 random seek, n sequential seeks) is not realistic at all. In the end, neither the C standard nor the C++ standard define such a thing for clear reasons.
No it's not. You can't even assume that an fread
will trigger physical I/O. Your OS has the possibility to do a lot of stuff with I/O requests, including caching the results, reordering and coalescing (or splitting) reads (and even sometimes writes).
If there is a lot of I/O going on, you can't count on getting sequential reads either, depending on what size buffer you (and possibly the I/O stream library) use. Some operating systems provide ways to "hint" that you will be reading sequentially on a file descriptor (or mmap
ed region) which could help.
As many explained, caching (perhaps at several levels) has to be taken into account.
Perhaps you want to know how to accelerate or tune it from your C code. This is highly operating system specific.
On recent Linux systems, you could use the readahead, madvise (with mmap) and others system calls.
Often, you can simply read in advance a file (perhaps just with cat yourfile > /dev/null
) and your program would then run faster on Linux.
Try for instance running twice the wc
word counting utility on some big file. The second run usually goes much faster than the first.
No, it's not. The blocks of a single file may be scattered all over the hard disk if the filesystem is fragmented.
You can assume whatever you want, it's much more complicated in reality.
fread/fwrite
will usually read and write from/to an internal buffer in the memory of your process. When the buffer is full/empty, they will forward the read/write to the operating system, which has its own cache. If you are reading and the OS can't find the portion of the file in the cache then your program will wait till the data is actually fetched from the hard-drive, which is an expensive operation. If you're writing then the data will be just copied to the OS cache and reside there till it'll flush to the disk, which may happen long after your program has closed the file. Then, today's hard drives have in turn their own caches which the OS may not even be aware of.