Efficient FIFO queue for arbitrarily sized chunks of bytes in Python

后端 未结 4 927
既然无缘
既然无缘 2021-02-05 09:19

How do I implement a FIFO buffer to which I can efficiently add arbitrarily sized chunks of bytes to the head and from which I can efficiently pop arbitrarily sized chunks of by

4条回答
  •  孤街浪徒
    2021-02-05 09:33

    ... but removing bytes from the beginning is very slow, because a new StringIO object, that holds a copy of the entire previous buffer minus the first chunk of bytes, must be created.

    This type of slowness can be overcome by using bytearray in Python>=v3.4. See discussion in this issue and the patch is here.

    The key is: removing head byte(s) from bytearray by

    a[:1] = b''   # O(1) (amortized)
    

    is much faster than

    a = a[1:]     # O(len(a))
    

    when len(a) is huge (say 10**6).

    The bytearray also provides you a convenient way to preview the whole data set as an array (i.e. itself), in contrast to deque container which needs to join objects into a chunk.

    Now an efficient FIFO can be implemented as follow

    class byteFIFO:
        """ byte FIFO buffer """
        def __init__(self):
            self._buf = bytearray()
    
        def put(self, data):
            self._buf.extend(data)
    
        def get(self, size):
            data = self._buf[:size]
            # The fast delete syntax
            self._buf[:size] = b''
            return data
    
        def peek(self, size):
            return self._buf[:size]
    
        def getvalue(self):
            # peek with no copy
            return self._buf
    
        def __len__(self):
            return len(self._buf)
    

    Benchmark

    import time
    bfifo = byteFIFO()
    bfifo.put(b'a'*1000000)        # a very long array
    t0 = time.time()
    for k in range(1000000):
        d = bfifo.get(4)           # "pop" from head
        bfifo.put(d)               # "push" in tail
    print('t = ', time.time()-t0)  # t = 0.897 on my machine
    

    The circular/ring buffer implementation in Cameron's answer needs 2.378 sec, and his/her original implementation needs 1.108 sec.

提交回复
热议问题