I have been trying to understand the tradeoff between read
and seek
. For small \"jumps\" reading unneeded data is faster than skipping it with se
I have seen similar situations while dealing with arduinos interfacing with EEPROM. Basically, in order to write or read from a chip or data structure, you have to send a write/read enable command, send a starting location, and then grab the first character. If you grab multiple bytes, however, most chips will auto-increment their target address registers. Thus, there is some overhead for starting a read/write operation. It's the difference between:
and
Just, in terms of machine instructions, reading multiple bits/bytes at a time clears a lot of overhead. It's even worse when some chips require you to idle for a few clock cycles after the read/write enable is send to let a mechanical process physically move a transistor into place to enable the reading or writing.
Reading from a file handle byte-for-byte will be generally slower than reading chunked.
In general, every read() call corresponds to a C read() call in Python. The total result involves a system call requesting the next char. For a file of 2 kb, this means 2000 calls to the kernel; each requiring a function call, request to the kernel, then awaiting response, passing that through the return.
Most notable here is awaiting response
, the system call will block until your call is acknowledged in a queue, so you have to wait.
Fewer calls the better, so more bytes is faster; which is why buffered io is in fairly common use.
In python, buffering can be provided by io.BufferedReader
or through the buffering
keyword argument on open
for files
I was able to reproduce the issue with your code. However, I noticed the following: can you verify that the issue disappears if you replace
file.seek(randint(0, file.raw._blksize), 1)
with
file.seek(randint(0, file.raw._blksize), 0)
in setup
? I think you might just run out of data at some point during reading 1 byte. Reading 2 bytes, 3 bytes and so on won't have any data to read, so that's why it's much faster.