With reference to Why NumPy instead of Python lists?
tom10 said :
Speed: Here\'s a test on doing a sum over a list and a NumPy array, showing
To answer your question, yes. Appending to an array is an expensive operation, while lists make it relatively cheap (see Internals of Python list, access and resizing runtimes for why). However, that's no reason to abandon numpy. There are other ways to easily add data to a numpy array.
There are surprising (to me, anyway) amount of ways to do this. Skip to the bottom to see benchmarks for each of them.
Probably the most common is to simply pre-allocate the array, and index into that,
#using preallocated numpy
start = time.time()
array = np.zeros(N)
for i in range(N):
array[i] = i
end = time.time()
print ("Using preallocated numpy: ", round(end-start, 5), end="\n")
Of course, you can preallocate the memory for a list too, so lets include that for a benchmark comparison.
#using preallocated list
start = time.time()
res = [None]*N
for i in range(N):
res[i] = i
res = np.array(res)
end = time.time()
print ("Using preallocated list : ", round(end-start, 5), end="\n")
Depending on your code, it may also be helpful to use numpy's fromiter
function, which uses the results of an iterator to initialize the array.
#using numpy fromiter shortcut
start = time.time()
res = np.fromiter(range(N), dtype='float64') # Use same dtype as other tests
end = time.time()
print ("Using fromiter : ", round(end-start, 5), end="\n")
Of course, using a built in iterator isn't terribly flexible so let's try a custom iterator as well,
#using custom iterator
start = time.time()
def it(N):
i = 0
while i < N:
yield i
i += 1
res = np.fromiter(it(N), dtype='float64') # Use same dtype as other tests
end = time.time()
print ("Using custom iterator : ", round(end-start, 5), end="\n")
That's two very flexible ways of using numpy
. The first, using a preallocated array, is the most flexible. Let's see how they compare:
Using numpy: 2.40017
Using list : 0.0164
Using preallocated numpy: 0.01604
Using preallocated list : 0.01322
Using fromiter : 0.00577
Using custom iterator : 0.01458
Right off, you can see that preallocating makes numpy
much faster than using lists, although preallocating the list brings both to about the same speed. Using a builtin iterator is extremely fast, although the iterator performance
drops back into the range of the preallocated array and list when a custom iterator is used.
Converting code directly to numpy
often has poor performance, as with append
. Finding an approach using numpy
's methods can almost always give a substantial improvement. In this case, preallocating the array or expressing the calculation of each element as an iterator to get similar performance to python lists. Or use a vanilla python list since the performance is about the same.
EDITS: Original answer also included np.fromfunction
. This was removed since it didn't fit the use case of adding one element at a time, fromfunction
actually initializes the array and uses numpy
's broadcasting to make a single function call. It is about a hundred times faster, so if you can solve your problem using broadcasting don't bother with these other methods.