I\'ve written a little benchmark where i compare different string concatenating methods for ZOCache.
So it looks here like tempfile.TemporaryFile is faster than anything
Your biggest problem: Per tdelaney, you never actually ran the TemporaryFile
test; you omitted the parens in the timeit
snippet (and only for that test, the others actually ran). So you were timing the time taken to lookup the name bench_temporaryfile
, but not to actually call it. Change:
print(str(timeit.timeit('bench_temporaryfile', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
to:
print(str(timeit.timeit('bench_temporaryfile()', setup="from __main__ import bench_temporaryfile", number=1000)) + " TemporaryFile")
(adding parens to make it a call) to fix.
Some other issues:
io.StringIO
is fundamentally different from your other test cases. Specifically, all the other types you're testing with operate in binary mode, reading and writing str
, and avoiding line ending conversions. io.StringIO
uses Python 3 style strings (unicode
in Python 2), which your tests acknowledge by using different literals and converting to unicode
instead of bytes
. This adds a lot of encoding and decoding overhead, as well as using a lot more memory (unicode
uses 2-4x the memory of str
for the same data, which means more allocator overhead, more copy overhead, etc.).
The other major difference is that you're setting a truly huge bufsize
for TemporaryFile
; few system calls would need to occur, and most writes are just appending to contiguous memory in the buffer. By contrast, io.StringIO
is storing the individual values written, and only joining them together when you ask for them with getvalue()
.
Also, lastly, you think you're being forward compatible by using the bytes
constructor, but you're not; in Python 2 bytes
is an alias for str
, so bytes(10)
returns '10'
, but in Python 3, bytes
is a totally different thing, and passing an integer to it returns a zero initialized bytes
object of that size, bytes(10)
returns b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
.
If you want a fair test case, at the very least switch to cStringIO.StringIO
or io.BytesIO
instead of io.StringIO
and write bytes
uniformly. Typically, you wouldn't explicitly set the buffer size for TemporaryFile
and the like yourself, so you might consider dropping that.
In my own tests on Linux x64 with Python 2.7.10, using ipython's %timeit
magic, the ranking is:
io.BytesIO
~48 μs per loopio.StringIO
~54 μs per loop (so unicode
overhead didn't add much)cStringIO.StringIO
~83 μs per loopTemporaryFile
~2.8 ms per loop (note units; ms is 1000x longer than μs)And that's without going back to default buffer sizes (I kept the explicit bufsize
from your tests). I suspect the behavior of TemporaryFile
will vary a lot more (depending on the OS and how temporary files are handled; some systems might just store in memory, others might store in /tmp
, but of course, /tmp
might just be a RAMdisk anyway).
Something tells me you may have a setup where the TemporaryFile
is basically a plain memory buffer that never goes to the file system, where mine may be ultimately ending up on persistent storage (if only for short periods); stuff happening in memory is predictable, but when you involve the file system (which TemporaryFile
can, depending on OS, kernel settings, etc.), the behavior will differ a great deal between systems.