Why is StringIO object slower than real file object?

后端 未结 3 935
无人共我
无人共我 2021-01-12 11:36

I\'m looking through the source of StringIO where it says says some notes:

  1. Using a real file is often faster (but less convenient)
3条回答
  •  花落未央
    2021-01-12 12:08

    This is not actually about Python's interpreted nature: BytesIO is implemented in Python*, same as StringIO, but still beats file I/O.

    In fact, StringIO is faster than file I/O under StringIO's ideal use case (a single write to the beginning of an empty buffer). Actually, if the write is big enough it'll even beat cStringIO. See my question here.

    So why is StringIO considered "slow"? StringIO's real problem is being backed by immutable sequences, whether str or unicode. This is fine if you only write once, obviously. But, as pointed out by tdelaney's answer to my question, it slows down a ton (like, 10-100x) when writing to random locations, since every time it gets a write in the middle it has to copy the entire backing sequence.

    BytesIO doesn't have this problem since it's backed by a (mutable) bytearray instead. Likewise, whatever cStringIO does, it seems to handle random writes much more easily. I'd guess that it breaks the immutability rule internally, since C strings are mutable.

    * Well, the version in _pyio is, anyway. The standard library version in io is written in C.

提交回复
热议问题