Confusing about StringIO, cStringIO and ByteIO

前端 未结 1 637
-上瘾入骨i
-上瘾入骨i 2021-01-01 10:55

I have googled and also search on SO for the difference between these buffer modules. However, I still don\'t understand very well and I think some of the posts I read are o

相关标签:
1条回答
  • 2021-01-01 11:18

    You should use io.StringIO for handling unicode objects and io.BytesIO for handling bytes objects in both python 2 and 3, for forwards-compatibility (this is all 3 has to offer).


    Here's a better test (for python 2 and 3), that doesn't include conversion costs from numpy to str/bytes

    import numpy as np
    import string
    b_data = np.random.choice(list(string.printable), size=1000000).tobytes()
    u_data = b_data.decode('ascii')
    u_data = u'\u2603' + u_data[1:]  # add a non-ascii character
    

    And then:

    import io
    %timeit io.StringIO(u_data)
    %timeit io.StringIO(b_data)
    %timeit io.BytesIO(u_data)
    %timeit io.BytesIO(b_data)
    

    In python 2, you can also test:

    import StringIO, cStringIO
    %timeit cStringIO.StringIO(u_data)
    %timeit cStringIO.StringIO(b_data)
    %timeit StringIO.StringIO(u_data)
    %timeit StringIO.StringIO(b_data)
    

    Some of these will crash, complaining about non-ascii characters


    Python 3.5 results:

    >>> %timeit io.StringIO(u_data)
    100 loops, best of 3: 8.61 ms per loop
    >>> %timeit io.StringIO(b_data)
    TypeError: initial_value must be str or None, not bytes
    >>> %timeit io.BytesIO(u_data)
    TypeError: a bytes-like object is required, not 'str'
    >>> %timeit io.BytesIO(b_data)
    The slowest run took 6.79 times longer than the fastest. This could mean that an intermediate result is being cached
    1000000 loops, best of 3: 344 ns per loop
    

    Python 2.7 results (run on a different machine):

    >>> %timeit io.StringIO(u_data)
    1000 loops, best of 3: 304 µs per loop
    >>> %timeit io.StringIO(b_data)
    TypeError: initial_value must be unicode or None, not str
    >>> %timeit io.BytesIO(u_data)
    TypeError: 'unicode' does not have the buffer interface
    >>> %timeit io.BytesIO(b_data)
    10000 loops, best of 3: 77.5 µs per loop
    
    >>> %timeit cStringIO.StringIO(u_data)
    UnicodeEncodeError: 'ascii' codec cant encode character u'\u2603' in position 0: ordinal not in range(128)
    >>> %timeit cStringIO.StringIO(b_data)
    1000000 loops, best of 3: 448 ns per loop
    >>> %timeit StringIO.StringIO(u_data)
    1000000 loops, best of 3: 1.15 µs per loop
    >>> %timeit StringIO.StringIO(b_data)
    1000000 loops, best of 3: 1.19 µs per loop
    
    0 讨论(0)
提交回复
热议问题