How to pipe binary data into numpy arrays without tmp storage?

前端未结

关注

 2  1576

一向

There are several similar questions but none of them answers this simple question directly:

How can i catch a commands output and stream that content into numpy arra

相关标签:

2条回答

日久生厌

2021-01-05 11:30
You can use Popen with stdout=subprocess.PIPE. Read in the header, then load the rest into a bytearray to use with np.frombuffer.

Additional comments based on your edit:

If you're going to call proc.stdout.read(), it's equivalent to using check_output(). Both create a temporary string. If you preallocate data, you could use proc.stdout.readinto(data). Then if the number of bytes read into data is less than len(data), free the excess memory, else extend data by whatever is left to be read.
```
data = bytearray(2**32) # 4 GiB
n = proc.stdout.readinto(data)
if n < len(data):
    data[n:] = ''        
else:
    data += proc.stdout.read()
```
You could also come at this starting with a pre-allocated ndarray ndata and use buf = np.getbuffer(ndata). Then readinto(buf) as above.

Here's an example to show that the memory is shared between the bytearray and the np.ndarray:
```
>>> data = bytearray('\x01')
>>> ndata = np.frombuffer(data, np.int8)
>>> ndata
array([1], dtype=int8)
>>> ndata[0] = 2
>>> data
bytearray(b'\x02')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

慢半拍i

2021-01-05 11:31

Since your data can easily fit in RAM, I think the easiest way to load the data into a numpy array is to use a ramfs.

On Linux,

sudo mkdir /mnt/ramfs
sudo mount -t ramfs -o size=5G ramfs /mnt/ramfs
sudo chmod 777 /mnt/ramfs

Then, for example, if this is the producer of the binary data:

writer.py:

from __future__ import print_function
import random
import struct
N = random.randrange(100)
print('a b')
for i in range(2*N):
    print(struct.pack('<d',random.random()), end = '')

Then you could load it into a numpy array like this:

reader.py:

import subprocess
import numpy

def parse_header(f):
    # this function moves the filepointer and returns a dictionary
    header = f.readline()
    d = dict.fromkeys(header.split())
    return d

filename = '/mnt/ramfs/data.out'
with open(filename, 'w') as f:  
    cmd = 'writer.py'
    proc = subprocess.Popen([cmd], stdout = f)
    proc.communicate()
with open(filename, 'r') as f:      
    header = parse_header(f)
    dt = numpy.dtype([(key, 'f8') for key in header.keys()])
    data = numpy.fromfile(f, dt)

0 讨论(0)