Fast linear interpolation in Numpy / Scipy “along a path”

前端 未结 3 1367
闹比i
闹比i 2021-02-01 18:05

Let\'s say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minut

3条回答
  •  情歌与酒
    2021-02-01 18:48

    For a fixed point in time, you can utilize the following interpolation function:

    g(a) = cc[0]*abs(a-aa[0]) + cc[1]*abs(a-aa[1]) + cc[2]*abs(a-aa[2])
    

    where a is the hiker's altitude, aa the vector with the 3 measurement altitudes and cc is a vector with the coefficients. There are three things to note:

    1. For given temperatures (alltemps) corresponding to aa, determining cc can be done by solving a linear matrix equation using np.linalg.solve().
    2. g(a) is easy to vectorize for a (N,) dimensional a and (N, 3) dimensional cc (including np.linalg.solve() respectively).
    3. g(a) is called a first order univariate spline kernel (for three points). Using abs(a-aa[i])**(2*d-1) would change the spline order to d. This approach could be interpreted a simplified version of a Gaussian Process in Machine Learning.

    So the code would be:

    import matplotlib.pyplot as plt
    import numpy as np
    import seaborn as sns
    
    # generate temperatures
    np.random.seed(0)
    N, sigma = 1000, 5
    trend = np.sin(4 / N * np.arange(N)) * 30
    alltemps = np.array([tmp0 + trend + sigma*np.random.randn(N)
                         for tmp0 in [70, 50, 40]])
    
    # generate attitudes:
    altitudes = np.array([500, 1500, 4000]).astype(float)
    location = np.linspace(altitudes[0], altitudes[-1], N)
    
    
    def doit():
        """ do the interpolation, improved version for speed """
        AA = np.vstack([np.abs(altitudes-a_i) for a_i in altitudes])
        # This is slighty faster than np.linalg.solve(), because AA is small:
        cc = np.dot(np.linalg.inv(AA), alltemps)
    
        return (cc[0]*np.abs(location-altitudes[0]) +
                cc[1]*np.abs(location-altitudes[1]) +
                cc[2]*np.abs(location-altitudes[2]))
    
    
    t_loc = doit()  # call interpolator
    
    # do the plotting:
    fg, ax = plt.subplots(num=1)
    for alt, t in zip(altitudes, alltemps):
        ax.plot(t, label="%d feet" % alt, alpha=.5)
    ax.plot(t_loc, label="Interpolation")
    ax.legend(loc="best", title="Altitude:")
    ax.set_xlabel("Time")
    ax.set_ylabel("Temperature")
    fg.canvas.draw()
    

    Measuring the time gives:

    In [2]: %timeit doit()
    10000 loops, best of 3: 107 µs per loop
    

    Update: I replaced the original list comprehensions in doit() to import speed by 30% (For N=1000).

    Furthermore, as requested for comparison, @moarningsun's benchmark code block on my machine:

    10 loops, best of 3: 110 ms per loop  
    interp_checked
    10000 loops, best of 3: 83.9 µs per loop
    scipy_interpn
    1000 loops, best of 3: 678 µs per loop
    Output allclose:
    [True, True, True]
    

    Note that N=1000 is a relatively small number. Using N=100000 produces the results:

    interp_checked
    100 loops, best of 3: 8.37 ms per loop
    
    %timeit doit()
    100 loops, best of 3: 5.31 ms per loop
    

    This shows that this approach scales better for large N than the interp_checked approach.

提交回复
热议问题