Fast linear interpolation in Numpy / Scipy “along a path”

前端未结

关注

 3  1367

闹比i 2021-02-01 18:05

Let\'s say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minut

3条回答

情歌与酒 (楼主)

2021-02-01 18:48

For a fixed point in time, you can utilize the following interpolation function:

g(a) = cc[0]*abs(a-aa[0]) + cc[1]*abs(a-aa[1]) + cc[2]*abs(a-aa[2])

where a is the hiker's altitude, aa the vector with the 3 measurement altitudes and cc is a vector with the coefficients. There are three things to note:

For given temperatures (alltemps) corresponding to aa, determining cc can be done by solving a linear matrix equation using np.linalg.solve().
g(a) is easy to vectorize for a (N,) dimensional a and (N, 3) dimensional cc (including np.linalg.solve() respectively).
g(a) is called a first order univariate spline kernel (for three points). Using abs(a-aa[i])**(2*d-1) would change the spline order to d. This approach could be interpreted a simplified version of a Gaussian Process in Machine Learning.

So the code would be:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# generate temperatures
np.random.seed(0)
N, sigma = 1000, 5
trend = np.sin(4 / N * np.arange(N)) * 30
alltemps = np.array([tmp0 + trend + sigma*np.random.randn(N)
                     for tmp0 in [70, 50, 40]])

# generate attitudes:
altitudes = np.array([500, 1500, 4000]).astype(float)
location = np.linspace(altitudes[0], altitudes[-1], N)


def doit():
    """ do the interpolation, improved version for speed """
    AA = np.vstack([np.abs(altitudes-a_i) for a_i in altitudes])
    # This is slighty faster than np.linalg.solve(), because AA is small:
    cc = np.dot(np.linalg.inv(AA), alltemps)

    return (cc[0]*np.abs(location-altitudes[0]) +
            cc[1]*np.abs(location-altitudes[1]) +
            cc[2]*np.abs(location-altitudes[2]))


t_loc = doit()  # call interpolator

# do the plotting:
fg, ax = plt.subplots(num=1)
for alt, t in zip(altitudes, alltemps):
    ax.plot(t, label="%d feet" % alt, alpha=.5)
ax.plot(t_loc, label="Interpolation")
ax.legend(loc="best", title="Altitude:")
ax.set_xlabel("Time")
ax.set_ylabel("Temperature")
fg.canvas.draw()

Measuring the time gives:

In [2]: %timeit doit()
10000 loops, best of 3: 107 µs per loop

Update: I replaced the original list comprehensions in doit() to import speed by 30% (For N=1000).

Furthermore, as requested for comparison, @moarningsun's benchmark code block on my machine:

10 loops, best of 3: 110 ms per loop  
interp_checked
10000 loops, best of 3: 83.9 µs per loop
scipy_interpn
1000 loops, best of 3: 678 µs per loop
Output allclose:
[True, True, True]

Note that N=1000 is a relatively small number. Using N=100000 produces the results:

interp_checked
100 loops, best of 3: 8.37 ms per loop

%timeit doit()
100 loops, best of 3: 5.31 ms per loop

This shows that this approach scales better for large N than the interp_checked approach.

0 讨论(0)

查看其它3个回答