What's the time complexity of indexing a numpy array directly

前端未结

关注

 3  941

I assume when having a numpy array, let\'s say

>>>>nArray
array([[  23425.     ,  521331.40625],
       [  23465.     ,  521246.03125],
       [


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2021-01-02 07:34
              
            
            
                                                                       
There is no hash table involved.  Numpy arrays are arrays, just like the name implies, and the address is computed like this:

address of nArray[x, y] = base address + A * x + B * y

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2021-01-02 07:42
              
            
            
                                                                       
On the one hand


  must be using a hash-table which will give a time complexity close to O(1). Is that right?


is not quite true. Numpy arrays are basically contiguous blocks of homogeneous memory, with some extra info on the side on dimensions and such. Therefore, the access is O(1), and just involves some trivial math to determine the position within the memory. 

On the other hand


  indexing must be pretty efficient.


is unfortunately not true at all. Asides from bounds checking (which arrays do), everything involving pure python is extremely inefficient (and accesses involve pure-python calls). Numpy array access is no exception. You should try to use vector operations whenever possible.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2021-01-02 07:46
              
            
            
                                                                       
To add some extra validation through testing to Ami's Answer I made a simple circular buffer from a numpy array that only uses direct indexing to make insertions. Basically each insertion just changes the values of the oldest element in the queue. 

The code is not completely bug free but it can serve as a basis for some simple performance benchmarking.

import math
import numpy as np


class CircFifo():
"""
helper class, uses a numpy array to provide a circular fixed size fifo
interface.

put(element): removes the oldest element and
places a new one

get(): returns the oldest entry

empty(): returns true if fifo is empty

full():  returns true if fifo is full
"""
def __init__(self, size):
    self.array = np.empty(shape=(size, 2))
    self.size = size
    self.array[:] = np.NAN
    self.top = 0
    self.bottom = 0

def put(self, row):
    self.array[self.top, :] = row
    self.top += 1
    if self.top == self.size:
        self.top = 0

def get(self):
    if not math.isnan(self.array[self.bottom, 0]):
        row = copy.deepcopy(self.array[self.bottom, :])
        self.array[self.bottom, :] = float('NaN')
        self.bottom += 1
    if self.bottom == self.size:
        self.bottom = 0
    if math.isnan(self.array[self.bottom, 0]):
        self.bottom = 0
        self.top = 0
    return row

def empty(self):
    if math.isnan(self.array[self.bottom, 0]):
        return True
    else:
        return False

def full(self):
    if self.size - np.count_nonzero(
            np.isnan(self.array[:, 0])) == self.size:
        return True
    else:
        return False


The correctness of the answers in the post seems to be confirmed through a simple test that I run. I tested the insertion performance against a deque   object. Even for 1000 insertions deque, which also servers as a dynamic and not static data structure (opposed to my static circular fifo), clearly outperforms the circular fifo.

Here is the simple test I run

In [5]: import time

In [6]: circFifo = CircFifo(300)

In [7]: elapsedTime = 0

In [8]: for i in range(1, 1000):
   ...:         start = time.time()
   ...:         circFifo.put(np.array([[52, 12]]))
   ...:         elapsedTime += time.time() - start
   ...:     

In [9]: elapsedTime
Out[9]: 0.010616540908813477


In [21]: queue = deque()

In [22]: elapsedTime = 0

In [23]: for i in range(1, 1000):
   ....:         start = time.time()
   ....:         queue.append(np.array([[52, 12]]))
   ....:         elapsedTime += time.time() - start
   ....:     

In [24]: elapsedTime
Out[24]: 0.00482630729675293


I know that this benchmark is not that informative but it becomes quite apparent that deque is much faster. For at least that amount of insertions.

I would expect that if that circular fifo was implemented in C with a static array it could not be easily outperformed. Since basically C's static array is the simplest and with less overhead data structure available.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复