I assume when having a numpy array, let\'s say
>>>>nArray
array([[ 23425. , 521331.40625],
[ 23465. , 521246.03125],
[
There is no hash table involved. Numpy arrays are arrays, just like the name implies, and the address is computed like this:
address of nArray[x, y] = base address + A * x + B * y
On the one hand
must be using a hash-table which will give a time complexity close to O(1). Is that right?
is not quite true. Numpy array
s are basically contiguous blocks of homogeneous memory, with some extra info on the side on dimensions and such. Therefore, the access is O(1), and just involves some trivial math to determine the position within the memory.
On the other hand
indexing must be pretty efficient.
is unfortunately not true at all. Asides from bounds checking (which arrays do), everything involving pure python is extremely inefficient (and accesses involve pure-python calls). Numpy array access is no exception. You should try to use vector operations whenever possible.
To add some extra validation through testing to Ami's Answer I made a simple circular buffer from a numpy array that only uses direct indexing to make insertions. Basically each insertion just changes the values of the oldest element in the queue.
The code is not completely bug free but it can serve as a basis for some simple performance benchmarking.
import math
import numpy as np
class CircFifo():
"""
helper class, uses a numpy array to provide a circular fixed size fifo
interface.
put(element): removes the oldest element and
places a new one
get(): returns the oldest entry
empty(): returns true if fifo is empty
full(): returns true if fifo is full
"""
def __init__(self, size):
self.array = np.empty(shape=(size, 2))
self.size = size
self.array[:] = np.NAN
self.top = 0
self.bottom = 0
def put(self, row):
self.array[self.top, :] = row
self.top += 1
if self.top == self.size:
self.top = 0
def get(self):
if not math.isnan(self.array[self.bottom, 0]):
row = copy.deepcopy(self.array[self.bottom, :])
self.array[self.bottom, :] = float('NaN')
self.bottom += 1
if self.bottom == self.size:
self.bottom = 0
if math.isnan(self.array[self.bottom, 0]):
self.bottom = 0
self.top = 0
return row
def empty(self):
if math.isnan(self.array[self.bottom, 0]):
return True
else:
return False
def full(self):
if self.size - np.count_nonzero(
np.isnan(self.array[:, 0])) == self.size:
return True
else:
return False
The correctness of the answers in the post seems to be confirmed through a simple test that I run. I tested the insertion performance against a deque object. Even for 1000 insertions deque, which also servers as a dynamic and not static data structure (opposed to my static circular fifo), clearly outperforms the circular fifo.
Here is the simple test I run
In [5]: import time
In [6]: circFifo = CircFifo(300)
In [7]: elapsedTime = 0
In [8]: for i in range(1, 1000):
...: start = time.time()
...: circFifo.put(np.array([[52, 12]]))
...: elapsedTime += time.time() - start
...:
In [9]: elapsedTime
Out[9]: 0.010616540908813477
In [21]: queue = deque()
In [22]: elapsedTime = 0
In [23]: for i in range(1, 1000):
....: start = time.time()
....: queue.append(np.array([[52, 12]]))
....: elapsedTime += time.time() - start
....:
In [24]: elapsedTime
Out[24]: 0.00482630729675293
I know that this benchmark is not that informative but it becomes quite apparent that deque is much faster. For at least that amount of insertions.
I would expect that if that circular fifo was implemented in C with a static array it could not be easily outperformed. Since basically C's static array is the simplest and with less overhead data structure available.