Efficient FIFO queue for arbitrarily sized chunks of bytes in Python

后端 未结 4 920
既然无缘
既然无缘 2021-02-05 09:19

How do I implement a FIFO buffer to which I can efficiently add arbitrarily sized chunks of bytes to the head and from which I can efficiently pop arbitrarily sized chunks of by

相关标签:
4条回答
  • 2021-02-05 09:30

    Can you assume anything about the expected read/write amounts?

    Chunking the data into, for example, 1024 byte fragments and using deque[1] might then work better; you could just read N full chunks, then one last chunk to split and put the remainder back on the start of the queue.

    1) collections.deque

    class collections.deque([iterable[, maxlen]])

    Returns a new deque object initialized left-to-right (using append()) with data from iterable. If iterable is not specified, the new deque is empty.

    Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction. ...

    0 讨论(0)
  • 2021-02-05 09:31

    I have currently implemented this with a StringIO object. Writing new bytes to the end of the StringIO object is fast, but removing bytes from the beginning is very slow, because a new StringIO object, that holds a copy of the entire previous buffer minus the first chunk of bytes, must be created.

    Actually the most typical way of implementing FIFO is two use wrap around buffer with two pointers as such:

    enter image description here image source

    Now, you can implement that with StringIO() using .seek() to read/write from appropriate location.

    0 讨论(0)
  • 2021-02-05 09:33

    ... but removing bytes from the beginning is very slow, because a new StringIO object, that holds a copy of the entire previous buffer minus the first chunk of bytes, must be created.

    This type of slowness can be overcome by using bytearray in Python>=v3.4. See discussion in this issue and the patch is here.

    The key is: removing head byte(s) from bytearray by

    a[:1] = b''   # O(1) (amortized)
    

    is much faster than

    a = a[1:]     # O(len(a))
    

    when len(a) is huge (say 10**6).

    The bytearray also provides you a convenient way to preview the whole data set as an array (i.e. itself), in contrast to deque container which needs to join objects into a chunk.

    Now an efficient FIFO can be implemented as follow

    class byteFIFO:
        """ byte FIFO buffer """
        def __init__(self):
            self._buf = bytearray()
    
        def put(self, data):
            self._buf.extend(data)
    
        def get(self, size):
            data = self._buf[:size]
            # The fast delete syntax
            self._buf[:size] = b''
            return data
    
        def peek(self, size):
            return self._buf[:size]
    
        def getvalue(self):
            # peek with no copy
            return self._buf
    
        def __len__(self):
            return len(self._buf)
    

    Benchmark

    import time
    bfifo = byteFIFO()
    bfifo.put(b'a'*1000000)        # a very long array
    t0 = time.time()
    for k in range(1000000):
        d = bfifo.get(4)           # "pop" from head
        bfifo.put(d)               # "push" in tail
    print('t = ', time.time()-t0)  # t = 0.897 on my machine
    

    The circular/ring buffer implementation in Cameron's answer needs 2.378 sec, and his/her original implementation needs 1.108 sec.

    0 讨论(0)
  • 2021-02-05 09:38

    Update: Here's an implementation of the circular buffer technique from vartec's answer (building on my original answer, preserved below for those curious):

    from cStringIO import StringIO
    
    class FifoFileBuffer(object):
        def __init__(self):
            self.buf = StringIO()
            self.available = 0    # Bytes available for reading
            self.size = 0
            self.write_fp = 0
    
        def read(self, size = None):
            """Reads size bytes from buffer"""
            if size is None or size > self.available:
                size = self.available
            size = max(size, 0)
    
            result = self.buf.read(size)
            self.available -= size
    
            if len(result) < size:
                self.buf.seek(0)
                result += self.buf.read(size - len(result))
    
            return result
    
    
        def write(self, data):
            """Appends data to buffer"""
            if self.size < self.available + len(data):
                # Expand buffer
                new_buf = StringIO()
                new_buf.write(self.read())
                self.write_fp = self.available = new_buf.tell()
                read_fp = 0
                while self.size <= self.available + len(data):
                    self.size = max(self.size, 1024) * 2
                new_buf.write('0' * (self.size - self.write_fp))
                self.buf = new_buf
            else:
                read_fp = self.buf.tell()
    
            self.buf.seek(self.write_fp)
            written = self.size - self.write_fp
            self.buf.write(data[:written])
            self.write_fp += len(data)
            self.available += len(data)
            if written < len(data):
                self.write_fp -= self.size
                self.buf.seek(0)
                self.buf.write(data[written:])
            self.buf.seek(read_fp)
    

    Original answer (superseded by the one above):

    You can use a buffer and track the start index (read file pointer), occasionally compacting it when it gets too large (this should yield pretty good amortized performance).

    For example, wrap a StringIO object like so:

    from cStringIO import StringIO
    class FifoBuffer(object):
        def __init__(self):
            self.buf = StringIO()
    
        def read(self, *args, **kwargs):
            """Reads data from buffer"""
            self.buf.read(*args, **kwargs)
    
        def write(self, *args, **kwargs):
            """Appends data to buffer"""
            current_read_fp = self.buf.tell()
            if current_read_fp > 10 * 1024 * 1024:
                # Buffer is holding 10MB of used data, time to compact
                new_buf = StringIO()
                new_buf.write(self.buf.read())
                self.buf = new_buf
                current_read_fp = 0
    
            self.buf.seek(0, 2)    # Seek to end
            self.buf.write(*args, **kwargs)
    
            self.buf.seek(current_read_fp)
    
    0 讨论(0)
提交回复
热议问题