There are several similar questions but none of them answers this simple question directly:
How can i catch a commands output and stream that content into numpy arra
You can use Popen
with stdout=subprocess.PIPE
. Read in the header, then load the rest into a bytearray
to use with np.frombuffer
.
Additional comments based on your edit:
If you're going to call proc.stdout.read()
, it's equivalent to using check_output()
. Both create a temporary string. If you preallocate data
, you could use proc.stdout.readinto(data)
. Then if the number of bytes read into data
is less than len(data)
, free the excess memory, else extend data
by whatever is left to be read.
data = bytearray(2**32) # 4 GiB
n = proc.stdout.readinto(data)
if n < len(data):
data[n:] = ''
else:
data += proc.stdout.read()
You could also come at this starting with a pre-allocated ndarray
ndata
and use buf = np.getbuffer(ndata)
. Then readinto(buf)
as above.
Here's an example to show that the memory is shared between the bytearray
and the np.ndarray
:
>>> data = bytearray('\x01')
>>> ndata = np.frombuffer(data, np.int8)
>>> ndata
array([1], dtype=int8)
>>> ndata[0] = 2
>>> data
bytearray(b'\x02')
Since your data can easily fit in RAM, I think the easiest way to load the data into a numpy array is to use a ramfs.
On Linux,
sudo mkdir /mnt/ramfs
sudo mount -t ramfs -o size=5G ramfs /mnt/ramfs
sudo chmod 777 /mnt/ramfs
Then, for example, if this is the producer of the binary data:
writer.py:
from __future__ import print_function
import random
import struct
N = random.randrange(100)
print('a b')
for i in range(2*N):
print(struct.pack('<d',random.random()), end = '')
Then you could load it into a numpy array like this:
reader.py:
import subprocess
import numpy
def parse_header(f):
# this function moves the filepointer and returns a dictionary
header = f.readline()
d = dict.fromkeys(header.split())
return d
filename = '/mnt/ramfs/data.out'
with open(filename, 'w') as f:
cmd = 'writer.py'
proc = subprocess.Popen([cmd], stdout = f)
proc.communicate()
with open(filename, 'r') as f:
header = parse_header(f)
dt = numpy.dtype([(key, 'f8') for key in header.keys()])
data = numpy.fromfile(f, dt)