Let\'s assume we opened a file using fopen()
and from the file-pointer received, fetch the file-descriptor using fileno()
. Then we do lots (>10^8) of r
We solved the problem described as having read()
return less bytes then request when reading from a file located on a NFS
mount, pointing to an OCFS2
file system (case 4 in my question).
It is a fact that using the setup mentioned above, such read()
s on file descriptors sometimes return less bytes then requested, without having errno
set.
To have all data read it is as simple as just read()
ing again and again up until the amount of data requested had been read.
Moreover such setup sometimes makes read()
fail with EIO
, and even then a simple re-read()
leads to success and data arrives.
My conclusion: Reading via OCFS2
via NFS
makes read()
ing from files behave like read()
ing from sockets which is inconsistent with the specifications of read()
http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html :
When attempting to read a file (other than a pipe or FIFO) that supports non-blocking reads and has no data currently available:
If O_NONBLOCK is set, read() shall return -1 and set errno to [EAGAIN].
If O_NONBLOCK is clear, read() shall block the calling thread until some data becomes available.
No need to say we never ever tried, nor even thought about to set O_NONBLOCK
for the file descriptors in question.
You should not assume that read() will not return less bytes than requested for any filesystem. This is particularly true in the case of large reads, as POSIX.1 indicates that read() behavior for sizes larger than SSIZE_MAX is implementation-dependent. On this mainstream Unix box I'm using right now, SSIZE_MAX is 32767 bytes. The fact that read() always returns the full amount today does not mean that it will in the future.
One possible reason might be that I/O priorities are more fully fleshed out in the kernel in the future. E.g. you're trying to read from the same device as another higher priority process and the other process would get better throughput if your process wasn't causing head movement away from the sectors the other process wants. The kernel might choose to give your read() a short count to get you out of the way for a while, instead of continuing to do inefficient interleaved block reads. Stranger things have been done for the sake of I/O efficiency. What is not prohibited often becomes compulsory.