I\'m working on a large code and I find myself in the need to speed up a specific bit of it. I\'ve created a MWE
shown below:
import numpy as np
You want to gradually convert this over from using lists and loops to using arrays and broadcasting, grabbing the easiest and/or most time-critical parts first until it's fast enough.
The first step is to not do that zip(*list2)
over and over (especially if this is Python 2.x). While we're at it, we might as well store it in an array, and do the same with list1
—you can still iterate over them for now. So:
array1 = np.array(list1)
array2 = np.array(zip(*list2))
# …
for elem in array1:
# …
for elem2 in array2:
This won't speed things up much—on my machine, it takes us from 14.1 seconds to 12.9—but it gives us somewhere to start working.
You should also remove the double calculation of sum(list3)
:
sum_list3 = sum(list3)
sum_list3 = sum_list3 if sum_list3>0. else 1e-06
Meanwhile, it's a bit odd that you want value <= 0
to go to 1e-6
, but 0 < value < 1e-6
to be left alone. Is that really intentional? If not, you can fix that, and simplify the code at the same time, by just doing this:
sum_list3 = max(array3.sum(), 1e-06)
Now, let's broadcast the A
and B
calculations:
# Broadcast over elements in list2.
A = np.exp(-0.5*((elem[0]-array2[:,0])/elem[3])**2)
B = np.exp(-0.5*((elem[1]-array2[:, 1])/elem[3])**2)
array3 = A*B
# Sum elements in list3 and append result to list4.
sum_list3 = max(array3.sum(), 1e-06)
list4.append(sum_list3)
And this gets us down from 12.9 seconds to 0.12. You could go a step further by also broadcasting over array1
, and replacing list4
with a pre-allocated array, and so forth, but this is probably already fast enough.