Let\'s say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minut
For a fixed point in time, you can utilize the following interpolation function:
g(a) = cc[0]*abs(a-aa[0]) + cc[1]*abs(a-aa[1]) + cc[2]*abs(a-aa[2])
where a
is the hiker's altitude, aa
the vector with the 3 measurement altitudes
and cc
is a vector with the coefficients. There are three things to note:
alltemps
) corresponding to aa
, determining cc
can be done by solving a linear matrix equation using np.linalg.solve()
.g(a)
is easy to vectorize for a (N,) dimensional a
and (N, 3) dimensional cc
(including np.linalg.solve()
respectively).g(a)
is called a first order univariate spline kernel (for three points). Using abs(a-aa[i])**(2*d-1)
would change the spline order to d
. This approach could be interpreted a simplified version of a Gaussian Process in Machine Learning.So the code would be:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# generate temperatures
np.random.seed(0)
N, sigma = 1000, 5
trend = np.sin(4 / N * np.arange(N)) * 30
alltemps = np.array([tmp0 + trend + sigma*np.random.randn(N)
for tmp0 in [70, 50, 40]])
# generate attitudes:
altitudes = np.array([500, 1500, 4000]).astype(float)
location = np.linspace(altitudes[0], altitudes[-1], N)
def doit():
""" do the interpolation, improved version for speed """
AA = np.vstack([np.abs(altitudes-a_i) for a_i in altitudes])
# This is slighty faster than np.linalg.solve(), because AA is small:
cc = np.dot(np.linalg.inv(AA), alltemps)
return (cc[0]*np.abs(location-altitudes[0]) +
cc[1]*np.abs(location-altitudes[1]) +
cc[2]*np.abs(location-altitudes[2]))
t_loc = doit() # call interpolator
# do the plotting:
fg, ax = plt.subplots(num=1)
for alt, t in zip(altitudes, alltemps):
ax.plot(t, label="%d feet" % alt, alpha=.5)
ax.plot(t_loc, label="Interpolation")
ax.legend(loc="best", title="Altitude:")
ax.set_xlabel("Time")
ax.set_ylabel("Temperature")
fg.canvas.draw()
Measuring the time gives:
In [2]: %timeit doit()
10000 loops, best of 3: 107 µs per loop
Update: I replaced the original list comprehensions in doit()
to import speed by 30% (For N=1000
).
Furthermore, as requested for comparison, @moarningsun's benchmark code block on my machine:
10 loops, best of 3: 110 ms per loop
interp_checked
10000 loops, best of 3: 83.9 µs per loop
scipy_interpn
1000 loops, best of 3: 678 µs per loop
Output allclose:
[True, True, True]
Note that N=1000
is a relatively small number. Using N=100000
produces the results:
interp_checked
100 loops, best of 3: 8.37 ms per loop
%timeit doit()
100 loops, best of 3: 5.31 ms per loop
This shows that this approach scales better for large N
than the interp_checked
approach.