fast way to read from StringIO until some byte is encountered

允我心安 提交于 2019-12-23 08:54:10

问题


Suppose I have some StringIO (from cStringIO). I want to read buffer from it until some character/byte is encountered, say 'Z', so:

stringio = StringIO('ABCZ123')
buf = read_until(stringio, 'Z')  # buf is now 'ABCZ'
# strinio.tell() is now 4, pointing after 'Z'

What is fastest way to do this in Python? Thank you


回答1:


I very disappointed that this question get only one answer on stack overflow, because it is interesting and relevant question. Anyway, since only ovgolovin give solution and I thinked it is maybe slow, I thought a faster solution:

def foo(stringio):
    datalist = []
    while True:
        chunk = stringio.read(256)
        i = chunk.find('Z')
        if i == -1:
            datalist.append(chunk)
        else:
            datalist.append(chunk[:i+1])
            break
        if len(chunk) < 256:
            break
    return ''.join(datalist)

This read io in chunks (maybe end char found not in first chunk). It is very fast because no Python function called for each character, but on the contrary maximal usage of C-written Python functions.

This run about 60x faster than ovgolovin's solution. I ran timeit to check it.




回答2:


i = iter(lambda: stringio.read(1),'Z')
buf = ''.join(i) + 'Z'

Here iter is used in this mode: iter(callable, sentinel) -> iterator.

''.join(...) is quite effective. The last operation of adding 'Z' ''.join(i) + 'Z' is not that good. But it can be addressed by adding 'Z' to the iterator:

from itertools import chain, repeat

stringio = StringIO.StringIO('ABCZ123')
i = iter(lambda: stringio.read(1),'Z')
i = chain(i,repeat('Z',1))
buf = ''.join(i)

One more way to do it is to use generator:

def take_until_included(stringio):
    while True:
        s = stringio.read(1)
        yield s
        if s=='Z':
            return

i = take_until_included(stringio)
buf = ''.join(i)

I made some efficiency tests. The performance of the described techniques is pretty the same:

http://ideone.com/dQGe5




回答3:


#!/usr/bin/env python3
import io


def iterate_stream(stream, delimiter, max_read_size=1024):
    """ Reads `delimiter` separated strings or bytes from `stream`. """
    empty = '' if isinstance(delimiter, str) else b''
    chunks = []
    while 1:
        d = stream.read(max_read_size)
        if not d:
            break
        while d:
            i = d.find(delimiter)
            if i < 0:
                chunks.append(d)
                break
            chunks.append(d[:i+1])
            d = d[i+1:]
            yield empty.join(chunks)
            chunks = []
    s = empty.join(chunks)
    if s:
        yield s


if __name__ == '__main__':
    print(next(iterate_stream(io.StringIO('ABCZ123'), 'Z')))
    print(next(iterate_stream(io.BytesIO(b'ABCZ123'), b'Z')))


来源:https://stackoverflow.com/questions/8279817/fast-way-to-read-from-stringio-until-some-byte-is-encountered

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!