I have been trying to understand the tradeoff between read
and seek
. For small \"jumps\" reading unneeded data is faster than skipping it with se
Reading from a file handle byte-for-byte will be generally slower than reading chunked.
In general, every read() call corresponds to a C read() call in Python. The total result involves a system call requesting the next char. For a file of 2 kb, this means 2000 calls to the kernel; each requiring a function call, request to the kernel, then awaiting response, passing that through the return.
Most notable here is awaiting response
, the system call will block until your call is acknowledged in a queue, so you have to wait.
Fewer calls the better, so more bytes is faster; which is why buffered io is in fairly common use.
In python, buffering can be provided by io.BufferedReader
or through the buffering
keyword argument on open
for files