Growing matrices columnwise in NumPy

后端未结

关注

 4  736

In pure Python you can grow matrices column by column pretty easily:

data = []
for i in something:
    newColumn = getColumnDataAsList(i)
    data.append(newColu


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2021-02-12 14:32
              
            
            
                                                                       
Usually you don't keep resizing a NumPy array when you create it.  What don't you like about your third solution?  If it's a very large matrix/array, then it might be worth allocating the array before you start assigning its values:

x = len(something)
y = getColumnDataAsNumpyArray.someLengthProperty

data = numpy.zeros( (x,y) )
for i in something:
   data[i] = getColumnDataAsNumpyArray(i)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  耶瑟儿～        
                
              
                            
                2021-02-12 14:34
              
            
            
                                                                       
NumPy actually does have an append function, which it seems might do what you want, e.g.,

import numpy as NP
my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
new_col = NP.array((5, 5, 5)).reshape(3, 1)
res = NP.append(my_data, new_col, axis=1)


your second snippet (hstack) will work if you add another line, e.g.,

my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))


hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.



While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:

initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).

For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:

>>> # initialize your skeleton array using 'empty' for lowest-memory footprint 
>>> M = NP.empty(shape=(10, 5), dtype=float)

>>> # create a small function to mimic step-wise populating this empty 2D array:
>>> fnx = lambda v : NP.random.randint(0, 10, v)


populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets

>>> for index, itm in enumerate(range(5)):    
        M[:,index] = fnx(10)

>>> M
  array([[ 1.,  7.,  0.,  8.,  7.],
         [ 9.,  0.,  6.,  9.,  4.],
         [ 2.,  3.,  6.,  3.,  4.],
         [ 3.,  4.,  1.,  0.,  5.],
         [ 2.,  3.,  5.,  3.,  0.],
         [ 4.,  6.,  5.,  6.,  2.],
         [ 0.,  6.,  1.,  6.,  8.],
         [ 3.,  8.,  0.,  8.,  0.],
         [ 5.,  2.,  5.,  0.,  1.],
         [ 0.,  6.,  5.,  9.,  1.]])


of course if you don't known in advance what size your array should be
just create one much bigger than you need and trim the 'unused' portions
when you finish populating it

>>> M[:3,:3]
  array([[ 9.,  3.,  1.],
         [ 9.,  6.,  8.],
         [ 9.,  7.,  5.]])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2021-02-12 14:38
              
            
            
                                                                       
Generally it is expensive to keep reallocating the NumPy array - so your third solution is really the best performance wise. 

However I think hstack will do what you want - the cue is in the error message, 


  ValueError: arrays must have same number of dimensions`


I'm guessing that newColumn has two dimensions (rather than a 1D vector), so you need data to also have two dimensions..., for example, data = np.array([[]]) - or alternatively make newColumn a 1D vector (generally if things are 1D it is better to keep them 1D in NumPy, so broadcasting, etc. work better). in which case use np.squeeze(newColumn) and hstack or vstack should work with your original definition of the data.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2021-02-12 14:43
              
            
            
                                                                       
The hstack can work on zero sized arrays:

import numpy as np

N = 5
M = 15

a = np.ndarray(shape = (N, 0))
for i in range(M):
    b = np.random.rand(N, 1)
    a = np.hstack((a, b))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复