I have lengthy computations which I repeat many times. Therefore, I would like to use memoization (packages such as jug and joblib), in concert with Pandas. The problem is wheth
I use this basic memoization decorator, memoized
. http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
DataFrames are hashable, so it should work fine. Here's an example.
In [2]: func = lambda df: df.apply(np.fft.fft)
In [3]: memoized_func = memoized(func)
In [4]: df = DataFrame(np.random.randn(1000, 1000))
In [5]: %timeit func(df)
10 loops, best of 3: 124 ms per loop
In [9]: %timeit memoized_func(df)
1000000 loops, best of 3: 1.46 us per loop
Looks good to me.