问题
I'm looking through the source of StringIO
where it says says some notes:
- Using a real file is often faster (but less convenient).
- There's also a much faster implementation in C, called
cStringIO
, but it's not subclassable.
StringIO
just like a memory file object,
why is it slower than real file object?
回答1:
Python's file handling is implemented entirely in C. This means that it's quite fast (at least in the same order of magnitude as native C code).
The StringIO library, however, is written in Python. The module itself is thus interpreted, with the associated performance penalties.
As you know, there is another module, cStringIO, with a similar interface, which you can use in performance-sensitive code. The reason this isn't subclassable is because it's written in C.
回答2:
This is not actually about Python's interpreted nature: BytesIO
is implemented in Python*, same as StringIO
, but still beats file I/O.
In fact, StringIO
is faster than file I/O under StringIO
's ideal use case (a single write to the beginning of an empty buffer). Actually, if the write is big enough it'll even beat cStringIO
. See my question here.
So why is StringIO
considered "slow"? StringIO
's real problem is being backed by immutable sequences, whether str
or unicode
. This is fine if you only write once, obviously. But, as pointed out by tdelaney's answer to my question, it slows down a ton (like, 10-100x) when writing to random locations, since every time it gets a write in the middle it has to copy the entire backing sequence.
BytesIO
doesn't have this problem since it's backed by a (mutable) bytearray
instead. Likewise, whatever cStringIO
does, it seems to handle random writes much more easily. I'd guess that it breaks the immutability rule internally, since C strings are mutable.
* Well, the version in _pyio
is, anyway. The standard library version in io
is written in C.
回答3:
It is not neccessarily obvious from the source but python file objects is built straight on the C library functions, with a likely small layer of python to present a python class, or even a C wrapper to present a python class. The native C library is going to be highly optimised to read bytes and blocks from disk. The python StringIO library is all native python code - which is slower than native C code.
来源:https://stackoverflow.com/questions/25580925/why-is-stringio-object-slower-than-real-file-object