How to pipe binary data into numpy arrays without tmp storage?

前端 未结 2 1573
一向
一向 2021-01-05 10:59

There are several similar questions but none of them answers this simple question directly:

How can i catch a commands output and stream that content into numpy arra

相关标签:
2条回答
  • 2021-01-05 11:30

    You can use Popen with stdout=subprocess.PIPE. Read in the header, then load the rest into a bytearray to use with np.frombuffer.

    Additional comments based on your edit:

    If you're going to call proc.stdout.read(), it's equivalent to using check_output(). Both create a temporary string. If you preallocate data, you could use proc.stdout.readinto(data). Then if the number of bytes read into data is less than len(data), free the excess memory, else extend data by whatever is left to be read.

    data = bytearray(2**32) # 4 GiB
    n = proc.stdout.readinto(data)
    if n < len(data):
        data[n:] = ''        
    else:
        data += proc.stdout.read()
    

    You could also come at this starting with a pre-allocated ndarray ndata and use buf = np.getbuffer(ndata). Then readinto(buf) as above.

    Here's an example to show that the memory is shared between the bytearray and the np.ndarray:

    >>> data = bytearray('\x01')
    >>> ndata = np.frombuffer(data, np.int8)
    >>> ndata
    array([1], dtype=int8)
    >>> ndata[0] = 2
    >>> data
    bytearray(b'\x02')
    
    0 讨论(0)
  • 2021-01-05 11:31

    Since your data can easily fit in RAM, I think the easiest way to load the data into a numpy array is to use a ramfs.

    On Linux,

    sudo mkdir /mnt/ramfs
    sudo mount -t ramfs -o size=5G ramfs /mnt/ramfs
    sudo chmod 777 /mnt/ramfs
    

    Then, for example, if this is the producer of the binary data:

    writer.py:

    from __future__ import print_function
    import random
    import struct
    N = random.randrange(100)
    print('a b')
    for i in range(2*N):
        print(struct.pack('<d',random.random()), end = '')
    

    Then you could load it into a numpy array like this:

    reader.py:

    import subprocess
    import numpy
    
    def parse_header(f):
        # this function moves the filepointer and returns a dictionary
        header = f.readline()
        d = dict.fromkeys(header.split())
        return d
    
    filename = '/mnt/ramfs/data.out'
    with open(filename, 'w') as f:  
        cmd = 'writer.py'
        proc = subprocess.Popen([cmd], stdout = f)
        proc.communicate()
    with open(filename, 'r') as f:      
        header = parse_header(f)
        dt = numpy.dtype([(key, 'f8') for key in header.keys()])
        data = numpy.fromfile(f, dt)
    
    0 讨论(0)
提交回复
热议问题