问题
I have a large array y
defined on a non-uniform, ordered grid x
. The length of the array is typically N~2^14 to N~2^18. I want to get a spline-interpolation (or quadratic) of the array. The problem I am facing is that even for lower values of N the interpolation takes very long.
import numpy as np
from scipy.interpolate import interp1d
N = 2 ** 12 # = 4096
x = np.linspace(0, 2*np.pi, N)
y = np.sin(x)
%time f = interp1d(x, y, 'cubic', )
CPU times: user 8min 5s, sys: 1.39 s, total: 8min 7s
Wall time: 8min 7s
One option I am seeing is that I need the values of the interpolation only at a very limited set of data points. Is there a way to compute the interpolation only when asked for it?
Can you suggest an alternative that also features extrapolation at values below x.min()
and above x.max()
?
Thank You!
回答1:
For non-uniformly distributed abscissa you might want to consider a generalized interpolation technique (such as B-splines).
Approximate your data as the sum of a number of coefficients times basis functions (for example B-splines with non-uniformly chosen knots - or a radial basis function network of well-placed Gaussians). These functions just must span the space of interest.
Now you can use least squares to approximate the weighting coefficients - and then re-sample anywhere at whatever resolution that you need. If you take this approach you may regularize the system based on smoothness as well to give more sane values outside of x.min() and x.max().
This is the collocation method: Lets say your sample values are in the vectors x,y. set up your basis vectors as sampled versions of phi_k(x)
then set up your basis B = c_[phi_1,phi_2,...,phi_M] and use least squares: c,res,rnk,sv = lstsq(B,y).
if the number of basis polynomials is small - then this can go fast.
Now your vector, c, contains the coefs. You compute new values at points of interest via building new basis vectors that are sampled there: Bnew = c_[phi_1_new,phi_2_new,...,phi_M_new]
and projecting y_new = dot(Bnew,c)
- this method easily gives you control to augment with any kind of regularization that you choose
- and re-sample the system at arbitrary points
- use any basis functions that make sense for your problem
- if M is small enough then the system can be solved quickly
回答2:
As a supplement to the comment by @HYRY and to second his suggestion to use InterpolatedUnivariateSpline
here are some benchmarks I made using a set of different array lengths.
interp1d does not seem to scale very well as shown below (y-axis is log time per point [most negative values correspond to fastest computations per interpolation point] ,x-axis power of 2 in N
).
Even where interp1d
performs best (near N=2**4
or 2**5
) InterpolatedUnivariateSpline
is around 2.5 orders of magnitude faster. Code to plot shown below plots.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d,InterpolatedUnivariateSpline
t=[]
for i in range(2,24):
N = 2 ** i
x = np.linspace(0, 2*np.pi, N)
y = np.sin(x)
t_=time.time()
for j in range(20):#to make results more robust
#f=interp1d(x,y,kind=3)
f = InterpolatedUnivariateSpline(x, y, k=3)
t_=time.time()-t_
t.append(np.log(t_/N))
plt.plot(np.arange(22)+2,t)
Note that InterpolatedUnivariateSpline
will consume more memory for large input arrays, so if this is a consideration interp1d
might be the better choice.
来源:https://stackoverflow.com/questions/23909266/interpolation-and-extrapolation-for-large-arrays