Fast linear interpolation in Numpy / Scipy “along a path”

前端 未结 3 1363
闹比i
闹比i 2021-02-01 18:05

Let\'s say that I have data from weather stations at 3 (known) altitudes on a mountain. Specifically, each station records a temperature measurement at its location every minut

3条回答
  •  清酒与你
    2021-02-01 19:04

    I'll offer one bit of progress. In the second case (interpolating "along a path") we are making many different interpolation functions. One thing we could try is to make just one interpolation function (one which does interpolation in the altitude dimension over all times as in the first case above) and evaluate that function over and over (in a vectorized way). That would give us way more data than we want (it would give us a 1,000 x 1,000 matrix instead of a 1,000-element vector). But then our target result would just be along the diagonal. So the question is, does calling a single function on way more complex arguments run faster than making many functions and calling them with simple arguments?

    The answer is yes!

    The key is that the interpolating function returned by scipy.interpolate.interp1d is able to accept a numpy.ndarray as its input. So you can effectively call the interpolating function many times at C-speed by feeding in a vector input. I.e. this is way, way faster than writing a for loop which calls the interpolating function over and over again on a scalar input. So while we compute many many data points that we wind up throwing away, we save even more time by not constructing many different interpolating functions that we barely use.

    old_way = interped_along_path = np.array([interp1d(altitudes, finaltemps.values[i, :])(loc) 
                                                          for i, loc in enumerate(location)])
    # look ma, no for loops!
    new_way = np.diagonal(interp1d(altitudes, finaltemps.values)(location)) 
    # note, `location` is a vector!
    abs(old_way - new_way).max()
    #-> 0.0
    

    and yet:

    %%timeit
    res = np.diagonal(interp1d(altitudes, finaltemps.values)(location))
    #-> 100 loops, best of 3: 16.7 ms per loop
    

    So this approach gets us a factor of 10 better! Can anyone do better? Or suggest an entirely different approach?

提交回复
热议问题