I\'m looking through the source of StringIO
where it says says some notes:
Python's file handling is implemented entirely in C. This means that it's quite fast (at least in the same order of magnitude as native C code).
The StringIO library, however, is written in Python. The module itself is thus interpreted, with the associated performance penalties.
As you know, there is another module, cStringIO, with a similar interface, which you can use in performance-sensitive code. The reason this isn't subclassable is because it's written in C.
It is not neccessarily obvious from the source but python file objects is built straight on the C library functions, with a likely small layer of python to present a python class, or even a C wrapper to present a python class. The native C library is going to be highly optimised to read bytes and blocks from disk. The python StringIO library is all native python code - which is slower than native C code.
This is not actually about Python's interpreted nature: BytesIO
is implemented in Python*, same as StringIO
, but still beats file I/O.
In fact, StringIO
is faster than file I/O under StringIO
's ideal use case (a single write to the beginning of an empty buffer). Actually, if the write is big enough it'll even beat cStringIO
. See my question here.
So why is StringIO
considered "slow"? StringIO
's real problem is being backed by immutable sequences, whether str
or unicode
. This is fine if you only write once, obviously. But, as pointed out by tdelaney's answer to my question, it slows down a ton (like, 10-100x) when writing to random locations, since every time it gets a write in the middle it has to copy the entire backing sequence.
BytesIO
doesn't have this problem since it's backed by a (mutable) bytearray
instead. Likewise, whatever cStringIO
does, it seems to handle random writes much more easily. I'd guess that it breaks the immutability rule internally, since C strings are mutable.
* Well, the version in _pyio
is, anyway. The standard library version in io
is written in C.