Flattening a list of NumPy arrays?

前端 未结 5 1570
别那么骄傲
别那么骄傲 2020-12-01 06:13

It appears that I have data in the format of a list of NumPy arrays (type() = np.ndarray):

[array([[ 0.00353654]]), array([[ 0.00353654]]), arra         


        
相关标签:
5条回答
  • 2020-12-01 06:31

    You could use numpy.concatenate, which as the name suggests, basically concatenates all the elements of such an input list into a single NumPy array, like so -

    import numpy as np
    out = np.concatenate(input_list).ravel()
    

    If you wish the final output to be a list, you can extend the solution, like so -

    out = np.concatenate(input_list).ravel().tolist()
    

    Sample run -

    In [24]: input_list
    Out[24]: 
    [array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]])]
    
    In [25]: np.concatenate(input_list).ravel()
    Out[25]: 
    array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
            0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
            0.00353654,  0.00353654,  0.00353654])
    

    Convert to list -

    In [26]: np.concatenate(input_list).ravel().tolist()
    Out[26]: 
    [0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654,
     0.00353654]
    
    0 讨论(0)
  • 2020-12-01 06:34

    I came across this same issue and found a solution that combines 1-D numpy arrays of variable length:

    np.column_stack(input_list).ravel()
    

    See numpy.column_stack for more info.

    Example with variable-length arrays with your example data:

    In [135]: input_list
    Out[135]: 
    [array([[ 0.00353654,  0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654]]),
     array([[ 0.00353654,  0.00353654,  0.00353654]])]
    
    In [136]: [i.size for i in input_list]    # variable size arrays
    Out[136]: [2, 1, 1, 3]
    
    In [137]: np.column_stack(input_list).ravel()
    Out[137]: 
    array([ 0.00353654,  0.00353654,  0.00353654,  0.00353654,  0.00353654,
            0.00353654,  0.00353654])
    

    Note: Only tested on Python 2.7.12

    0 讨论(0)
  • 2020-12-01 06:40

    Another simple approach would be to use numpy.hstack() followed by removing the singleton dimension using squeeze() as in:

    In [61]: np.hstack(list_of_arrs).squeeze()
    Out[61]: 
    array([0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
           0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654,
           0.00353654, 0.00353654, 0.00353654])
    
    0 讨论(0)
  • 2020-12-01 06:42

    Can also be done by

    np.array(list_of_arrays).flatten().tolist()
    

    resulting in

    [0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654, 0.00353654]
    

    Update

    As @aydow points out in the comments, using numpy.ndarray.ravel can be faster if one doesn't care about getting a copy or a view

    np.array(list_of_arrays).ravel()
    

    Although, according to docs

    When a view is desired in as many cases as possible, arr.reshape(-1) may be preferable.

    In other words

    np.array(list_of_arrays).reshape(-1)
    

    The initial suggestion of mine was to use numpy.ndarray.flatten that returns a copy every time which affects performance.

    Let's now see how the time complexity of the above-listed solutions compares using perfplot package for a setup similar to the one of the OP

    import perfplot
    
    perfplot.show(
        setup=lambda n: np.random.rand(n, 2),
        kernels=[lambda a: a.ravel(),
                 lambda a: a.flatten(),
                 lambda a: a.reshape(-1)],
        labels=['ravel', 'flatten', 'reshape'],
        n_range=[2**k for k in range(16)],
        xlabel='N')
    

    Here flatten demonstrates piecewise linear complexity which can be reasonably explained by it making a copy of the initial array compare to constant complexities of ravel and reshape that return a view.

    It's also worth noting that, quite predictably, converting the outputs .tolist() evens out the performance of all three to equally linear.

    0 讨论(0)
  • 2020-12-01 06:44

    Another way using itertools for flattening the array:

    import itertools
    
    # Recreating array from question
    a = [np.array([[0.00353654]])] * 13
    
    # Make an iterator to yield items of the flattened list and create a list from that iterator
    flattened = list(itertools.chain.from_iterable(a))
    

    This solution should be very fast, see https://stackoverflow.com/a/408281/5993892 for more explanation.

    If the resulting data structure should be a numpy array instead, use numpy.fromiter() to exhaust the iterator into an array:

    # Make an iterator to yield items of the flattened list and create a numpy array from that iterator
    flattened_array = np.fromiter(itertools.chain.from_iterable(a), float)
    

    Docs for itertools.chain.from_iterable(): https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable

    Docs for numpy.fromiter(): https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html

    0 讨论(0)
提交回复
热议问题