Growing matrices columnwise in NumPy

后端 未结 4 723
终归单人心
终归单人心 2021-02-12 13:58

In pure Python you can grow matrices column by column pretty easily:

data = []
for i in something:
    newColumn = getColumnDataAsList(i)
    data.append(newColu         


        
相关标签:
4条回答
  • 2021-02-12 14:32

    Usually you don't keep resizing a NumPy array when you create it. What don't you like about your third solution? If it's a very large matrix/array, then it might be worth allocating the array before you start assigning its values:

    x = len(something)
    y = getColumnDataAsNumpyArray.someLengthProperty
    
    data = numpy.zeros( (x,y) )
    for i in something:
       data[i] = getColumnDataAsNumpyArray(i)
    
    0 讨论(0)
  • 2021-02-12 14:34

    NumPy actually does have an append function, which it seems might do what you want, e.g.,

    import numpy as NP
    my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
    new_col = NP.array((5, 5, 5)).reshape(3, 1)
    res = NP.append(my_data, new_col, axis=1)
    

    your second snippet (hstack) will work if you add another line, e.g.,

    my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
    # the line to add--does not depend on array dimensions
    new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
    res = NP.hstack((my_data, new_col))
    

    hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.


    While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:

    initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).

    For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:

    >>> # initialize your skeleton array using 'empty' for lowest-memory footprint 
    >>> M = NP.empty(shape=(10, 5), dtype=float)
    
    >>> # create a small function to mimic step-wise populating this empty 2D array:
    >>> fnx = lambda v : NP.random.randint(0, 10, v)
    

    populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets

    >>> for index, itm in enumerate(range(5)):    
            M[:,index] = fnx(10)
    
    >>> M
      array([[ 1.,  7.,  0.,  8.,  7.],
             [ 9.,  0.,  6.,  9.,  4.],
             [ 2.,  3.,  6.,  3.,  4.],
             [ 3.,  4.,  1.,  0.,  5.],
             [ 2.,  3.,  5.,  3.,  0.],
             [ 4.,  6.,  5.,  6.,  2.],
             [ 0.,  6.,  1.,  6.,  8.],
             [ 3.,  8.,  0.,  8.,  0.],
             [ 5.,  2.,  5.,  0.,  1.],
             [ 0.,  6.,  5.,  9.,  1.]])
    

    of course if you don't known in advance what size your array should be just create one much bigger than you need and trim the 'unused' portions when you finish populating it

    >>> M[:3,:3]
      array([[ 9.,  3.,  1.],
             [ 9.,  6.,  8.],
             [ 9.,  7.,  5.]])
    
    0 讨论(0)
  • 2021-02-12 14:38

    Generally it is expensive to keep reallocating the NumPy array - so your third solution is really the best performance wise.

    However I think hstack will do what you want - the cue is in the error message,

    ValueError: arrays must have same number of dimensions`

    I'm guessing that newColumn has two dimensions (rather than a 1D vector), so you need data to also have two dimensions..., for example, data = np.array([[]]) - or alternatively make newColumn a 1D vector (generally if things are 1D it is better to keep them 1D in NumPy, so broadcasting, etc. work better). in which case use np.squeeze(newColumn) and hstack or vstack should work with your original definition of the data.

    0 讨论(0)
  • 2021-02-12 14:43

    The hstack can work on zero sized arrays:

    import numpy as np
    
    N = 5
    M = 15
    
    a = np.ndarray(shape = (N, 0))
    for i in range(M):
        b = np.random.rand(N, 1)
        a = np.hstack((a, b))
    
    0 讨论(0)
提交回复
热议问题