I am trying to define a function that contains an inner loop for simulating an integral.
The problem is speed. Evaluating the function once can take up to 30 seconds on
Will a profiler help you figure out which part is slow? I like to run programs using the standard library profiler:
python -O -m cProfile -o profile.out MYAPP.py
and then view the output from that in the 'RunSnakeRun' GUI:
runsnake profile.out
A RunSnakeRun can be installed from here: http://www.vrplumber.com/programming/runsnakerun/
You could definitely speed up your code by using more of Numpy's capabilities.
For instance:
cdef np.ndarray[double, ndim=1] S = np.zeros(dtype = "d", shape = J)
cdef int j
for j in xrange(ns):
S += P_i[:,j]
would be much faster and legible as
S = P_i.sum(axis=1)
You also repeat some calculations, which thus take twice more time than necessary. For instance
np.where(data[:,1]==(yr + 72))
could be calculated only once and stored in a variable that you could reuse.
You also perform a lot of reshaping and slicing: it could help to have your variables be in a simpler format from the beginning on. If possible, your code would be much clearer, and optimizations could be much more obvious.
Taking the advice given here, I have spent more time profiling the above code. To hopefully clean things up a bit I defined
I have profiled the code a bit more and have a better idea of which pieces are the slowest. I additionally defined
X = data[:, 2:7]
m_y = data[:, 21].reshape(J,1)
sigma_y = 1.0
price = data[:, 7].reshape(J, 1)
shares_data = data[:,8]
Then it is the following lines that are eating up most of the total time.
mu_ij = np.dot((X*np.array([s1, s2, s3, s4, s5])), nu[1:K+1,:])
mu_y = a * np.log(np.exp(m_y + sigma_y*nu[0,:].reshape(1,ns)) - price)
V = delta.reshape(J,1) + mu_ij + mu_y
exp_vi = np.exp(V)
P_i = (1.0 / np.sum(exp_vi[np.where(data[:,1]==71)], 0)) * exp_vi[np.where(data[:,1]==71)]
for yr in xarange(19):
P_yr = (1.0 / np.sum(exp_vi[np.where(data[:,1]==yr)], 0)) * exp_vi[np.where(data[:,1]==yr)]
P_i = np.concatenate((P_i, P_yr))
I get the impression this is an overly cumbersome way to achieve my goal. I was hoping somebody might be able to provide some advice on how to speed these lines up. Maybe there are Numpy capabilities I am missing? If this problem is not sufficiently well specified for you to be helpful, I would be happy to provide more details on the context of my problem. Thanks!
Cython can produce an html file to help with this:
cython -a MODULE.py
This shows each line of source code colored white through various shades of yellow. The darker the yellow color, the more dynamic Python behaviour is still being performed on that line. For each line that contains some yellow, you need to add more static typing declarations.
When I'm doing this, I like to split parts of my source code that I'm having trouble with onto many separate lines, one for each expression or operator, to get the most granular view.
Without this, it's easy to overlook some static type declarations of variables, function calls or operators. (e.g. the indexing operator x[y] is still a fully-dynamic Python operation unless you declare otherwise)
The "fundamental mistake" is that you expect good performance in long loops from Python. It's an interpreted language, and switching between implementations and ctyping does nothing to this. There are a few numeric Python libraries for fast computing, mosly written in C. For example, if you already use numpy
for arrays, why not you go further and use scipy
for your advanced math? It will increase both readability and speed.
Cython doesn't offer automatic performance gains, you have to know its internals and check the generated C code.
In particular if you want to improve loops performances, you have to avoid calling Python functions in them, which you happen to do a lot in this case (all the np.
calls are Python calls, slicing, and probably other things).
See this page for general guidelines about performance optimization with Cython (the -a switch really is handy when optimizing) and this one for specificities when optimizing numpy code.