At the heart of an application (written in Python and using NumPy) I need to rotate a 4th order tensor. Actually, I need to rotate a lot of tensors many times and this is my
I thought I'd contribute a relatively new data point to these benchmarks, using parakeet, one of the numpy
-aware JIT compilers that's sprung up in the past few months. (The other one I'm aware of is numba, but I didn't test it here.)
After you make it through the somewhat labyrinthine installation process for LLVM, you can decorate many pure-numpy
functions to (often) speed up their performance :
import numpy as np
import parakeet
@parakeet.jit
def rotT(T, g):
# ...
I only tried applying the JIT to Andrew's code in the original question, but it does pretty well (> 10x speedup) for not having to write any new code whatsoever :
andrew 10 loops, best of 3: 206 msec per loop
andrew_jit 10 loops, best of 3: 13.3 msec per loop
sven 100 loops, best of 3: 2.39 msec per loop
philipp 1000 loops, best of 3: 0.879 msec per loop
For these timings (on my laptop) I ran each function ten times, to give the JIT a chance to identify and optimize the hot code paths.
Interestingly, Sven and Philipp's suggestions are still orders of magnitude faster !