Why is reading one byte 20x slower than reading 2, 3, 4, … bytes from a file?

前端 未结 3 1224
死守一世寂寞
死守一世寂寞 2021-02-05 01:54

I have been trying to understand the tradeoff between read and seek. For small \"jumps\" reading unneeded data is faster than skipping it with se

相关标签:
3条回答
  • 2021-02-05 02:22

    I have seen similar situations while dealing with arduinos interfacing with EEPROM. Basically, in order to write or read from a chip or data structure, you have to send a write/read enable command, send a starting location, and then grab the first character. If you grab multiple bytes, however, most chips will auto-increment their target address registers. Thus, there is some overhead for starting a read/write operation. It's the difference between:

    • Start communications
    • Send read enable
    • Send read command
    • Send address 1
    • Grab data from target 1
    • End communications
    • Start communications
    • Send read enable
    • Send read command
    • Send address 2
    • Grab data from target 2
    • End communications

    and

    • Start communications
    • Send read enable
    • Send read command
    • Send address 1
    • Grab data from target 1
    • Grab data from target 2
    • End communications

    Just, in terms of machine instructions, reading multiple bits/bytes at a time clears a lot of overhead. It's even worse when some chips require you to idle for a few clock cycles after the read/write enable is send to let a mechanical process physically move a transistor into place to enable the reading or writing.

    0 讨论(0)
  • 2021-02-05 02:37

    Reading from a file handle byte-for-byte will be generally slower than reading chunked.

    In general, every read() call corresponds to a C read() call in Python. The total result involves a system call requesting the next char. For a file of 2 kb, this means 2000 calls to the kernel; each requiring a function call, request to the kernel, then awaiting response, passing that through the return.

    Most notable here is awaiting response, the system call will block until your call is acknowledged in a queue, so you have to wait.

    Fewer calls the better, so more bytes is faster; which is why buffered io is in fairly common use.

    In python, buffering can be provided by io.BufferedReader or through the buffering keyword argument on open for files

    0 讨论(0)
  • 2021-02-05 02:40

    I was able to reproduce the issue with your code. However, I noticed the following: can you verify that the issue disappears if you replace

    file.seek(randint(0, file.raw._blksize), 1)
    

    with

    file.seek(randint(0, file.raw._blksize), 0)
    

    in setup? I think you might just run out of data at some point during reading 1 byte. Reading 2 bytes, 3 bytes and so on won't have any data to read, so that's why it's much faster.

    0 讨论(0)
提交回复
热议问题