Efficient FIFO queue for arbitrarily sized chunks of bytes in Python

后端未结

关注

 4  920

How do I implement a FIFO buffer to which I can efficiently add arbitrarily sized chunks of bytes to the head and from which I can efficiently pop arbitrarily sized chunks of by

相关标签:

4条回答

旧时难觅i

2021-02-05 09:30

Can you assume anything about the expected read/write amounts?

Chunking the data into, for example, 1024 byte fragments and using deque[1] might then work better; you could just read N full chunks, then one last chunk to split and put the remainder back on the start of the queue.

1) collections.deque

class collections.deque([iterable[, maxlen]])

Returns a new deque object initialized left-to-right (using append()) with data from iterable. If iterable is not specified, the new deque is empty.

Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction. ...

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2021-02-05 09:31

I have currently implemented this with a StringIO object. Writing new bytes to the end of the StringIO object is fast, but removing bytes from the beginning is very slow, because a new StringIO object, that holds a copy of the entire previous buffer minus the first chunk of bytes, must be created.

Actually the most typical way of implementing FIFO is two use wrap around buffer with two pointers as such:

image source

Now, you can implement that with StringIO() using .seek() to read/write from appropriate location.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2021-02-05 09:33
... but removing bytes from the beginning is very slow, because a new StringIO object, that holds a copy of the entire previous buffer minus the first chunk of bytes, must be created.

This type of slowness can be overcome by using bytearray in Python>=v3.4. See discussion in this issue and the patch is here.

The key is: removing head byte(s) from bytearray by
```
a[:1] = b''   # O(1) (amortized)
```
is much faster than
```
a = a[1:]     # O(len(a))
```
when len(a) is huge (say 10**6).

The bytearray also provides you a convenient way to preview the whole data set as an array (i.e. itself), in contrast to deque container which needs to join objects into a chunk.

Now an efficient FIFO can be implemented as follow
```
class byteFIFO:
    """ byte FIFO buffer """
    def __init__(self):
        self._buf = bytearray()

    def put(self, data):
        self._buf.extend(data)

    def get(self, size):
        data = self._buf[:size]
        # The fast delete syntax
        self._buf[:size] = b''
        return data

    def peek(self, size):
        return self._buf[:size]

    def getvalue(self):
        # peek with no copy
        return self._buf

    def __len__(self):
        return len(self._buf)
```
Benchmark
```
import time
bfifo = byteFIFO()
bfifo.put(b'a'*1000000)        # a very long array
t0 = time.time()
for k in range(1000000):
    d = bfifo.get(4)           # "pop" from head
    bfifo.put(d)               # "push" in tail
print('t = ', time.time()-t0)  # t = 0.897 on my machine
```
The circular/ring buffer implementation in Cameron's answer needs 2.378 sec, and his/her original implementation needs 1.108 sec.
0 讨论(0)
发布评论:

提交评论
- 加载中...

心在旅途

2021-02-05 09:38

Update: Here's an implementation of the circular buffer technique from vartec's answer (building on my original answer, preserved below for those curious):

from cStringIO import StringIO

class FifoFileBuffer(object):
    def __init__(self):
        self.buf = StringIO()
        self.available = 0    # Bytes available for reading
        self.size = 0
        self.write_fp = 0

    def read(self, size = None):
        """Reads size bytes from buffer"""
        if size is None or size > self.available:
            size = self.available
        size = max(size, 0)

        result = self.buf.read(size)
        self.available -= size

        if len(result) < size:
            self.buf.seek(0)
            result += self.buf.read(size - len(result))

        return result


    def write(self, data):
        """Appends data to buffer"""
        if self.size < self.available + len(data):
            # Expand buffer
            new_buf = StringIO()
            new_buf.write(self.read())
            self.write_fp = self.available = new_buf.tell()
            read_fp = 0
            while self.size <= self.available + len(data):
                self.size = max(self.size, 1024) * 2
            new_buf.write('0' * (self.size - self.write_fp))
            self.buf = new_buf
        else:
            read_fp = self.buf.tell()

        self.buf.seek(self.write_fp)
        written = self.size - self.write_fp
        self.buf.write(data[:written])
        self.write_fp += len(data)
        self.available += len(data)
        if written < len(data):
            self.write_fp -= self.size
            self.buf.seek(0)
            self.buf.write(data[written:])
        self.buf.seek(read_fp)

Original answer (superseded by the one above):

You can use a buffer and track the start index (read file pointer), occasionally compacting it when it gets too large (this should yield pretty good amortized performance).

For example, wrap a StringIO object like so:

from cStringIO import StringIO
class FifoBuffer(object):
    def __init__(self):
        self.buf = StringIO()

    def read(self, *args, **kwargs):
        """Reads data from buffer"""
        self.buf.read(*args, **kwargs)

    def write(self, *args, **kwargs):
        """Appends data to buffer"""
        current_read_fp = self.buf.tell()
        if current_read_fp > 10 * 1024 * 1024:
            # Buffer is holding 10MB of used data, time to compact
            new_buf = StringIO()
            new_buf.write(self.buf.read())
            self.buf = new_buf
            current_read_fp = 0

        self.buf.seek(0, 2)    # Seek to end
        self.buf.write(*args, **kwargs)

        self.buf.seek(current_read_fp)

0 讨论(0)