What's the time complexity of indexing a numpy array directly

前端 未结 3 941
情话喂你
情话喂你 2021-01-02 07:06

I assume when having a numpy array, let\'s say

>>>>nArray
array([[  23425.     ,  521331.40625],
       [  23465.     ,  521246.03125],
       [         


        
相关标签:
3条回答
  • 2021-01-02 07:34

    There is no hash table involved. Numpy arrays are arrays, just like the name implies, and the address is computed like this:

    address of nArray[x, y] = base address + A * x + B * y
    
    0 讨论(0)
  • 2021-01-02 07:42

    On the one hand

    must be using a hash-table which will give a time complexity close to O(1). Is that right?

    is not quite true. Numpy arrays are basically contiguous blocks of homogeneous memory, with some extra info on the side on dimensions and such. Therefore, the access is O(1), and just involves some trivial math to determine the position within the memory.

    On the other hand

    indexing must be pretty efficient.

    is unfortunately not true at all. Asides from bounds checking (which arrays do), everything involving pure python is extremely inefficient (and accesses involve pure-python calls). Numpy array access is no exception. You should try to use vector operations whenever possible.

    0 讨论(0)
  • 2021-01-02 07:46

    To add some extra validation through testing to Ami's Answer I made a simple circular buffer from a numpy array that only uses direct indexing to make insertions. Basically each insertion just changes the values of the oldest element in the queue.

    The code is not completely bug free but it can serve as a basis for some simple performance benchmarking.

    import math
    import numpy as np
    
    
    class CircFifo():
    """
    helper class, uses a numpy array to provide a circular fixed size fifo
    interface.
    
    put(element): removes the oldest element and
    places a new one
    
    get(): returns the oldest entry
    
    empty(): returns true if fifo is empty
    
    full():  returns true if fifo is full
    """
    def __init__(self, size):
        self.array = np.empty(shape=(size, 2))
        self.size = size
        self.array[:] = np.NAN
        self.top = 0
        self.bottom = 0
    
    def put(self, row):
        self.array[self.top, :] = row
        self.top += 1
        if self.top == self.size:
            self.top = 0
    
    def get(self):
        if not math.isnan(self.array[self.bottom, 0]):
            row = copy.deepcopy(self.array[self.bottom, :])
            self.array[self.bottom, :] = float('NaN')
            self.bottom += 1
        if self.bottom == self.size:
            self.bottom = 0
        if math.isnan(self.array[self.bottom, 0]):
            self.bottom = 0
            self.top = 0
        return row
    
    def empty(self):
        if math.isnan(self.array[self.bottom, 0]):
            return True
        else:
            return False
    
    def full(self):
        if self.size - np.count_nonzero(
                np.isnan(self.array[:, 0])) == self.size:
            return True
        else:
            return False
    

    The correctness of the answers in the post seems to be confirmed through a simple test that I run. I tested the insertion performance against a deque object. Even for 1000 insertions deque, which also servers as a dynamic and not static data structure (opposed to my static circular fifo), clearly outperforms the circular fifo.

    Here is the simple test I run

    In [5]: import time
    
    In [6]: circFifo = CircFifo(300)
    
    In [7]: elapsedTime = 0
    
    In [8]: for i in range(1, 1000):
       ...:         start = time.time()
       ...:         circFifo.put(np.array([[52, 12]]))
       ...:         elapsedTime += time.time() - start
       ...:     
    
    In [9]: elapsedTime
    Out[9]: 0.010616540908813477
    
    
    In [21]: queue = deque()
    
    In [22]: elapsedTime = 0
    
    In [23]: for i in range(1, 1000):
       ....:         start = time.time()
       ....:         queue.append(np.array([[52, 12]]))
       ....:         elapsedTime += time.time() - start
       ....:     
    
    In [24]: elapsedTime
    Out[24]: 0.00482630729675293
    

    I know that this benchmark is not that informative but it becomes quite apparent that deque is much faster. For at least that amount of insertions.

    I would expect that if that circular fifo was implemented in C with a static array it could not be easily outperformed. Since basically C's static array is the simplest and with less overhead data structure available.

    0 讨论(0)
提交回复
热议问题