scipy.interpolate.UnivariateSpline not smoothing regardless of parameters

后端 未结 4 631
死守一世寂寞
死守一世寂寞 2021-01-01 04:31

I\'m having trouble getting scipy.interpolate.UnivariateSpline to use any smoothing when interpolating. Based on the function\'s page as well as some previous posts, I beli

相关标签:
4条回答
  • 2021-01-01 05:02

    While I'm not aware of any library which will do it for you off-hand, I'd try a bit more DIY approach: I'd start from making a spline with knots in between the raw data points, in both x and y. In your particular example, having a single knot in between the 4th and 5th points should do the trick, since it'd remove the huge derivative at around x=8000.

    0 讨论(0)
  • 2021-01-01 05:05

    Short answer: you need to choose the value for s more carefully.

    The documentation for UnivariateSpline states that:

    Positive smoothing factor used to choose the number of knots. Number of 
    knots will be increased until the     smoothing condition is satisfied:
    sum((w[i]*(y[i]-s(x[i])))**2,axis=0) <= s
    

    From this one can deduce that "reasonable" values for smoothing, if you don't pass in explicit weights, are around s = m * v where m is the number of data points and v the variance of the data. In this case, s_good ~ 5e7.

    EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.

    0 讨论(0)
  • 2021-01-01 05:06

    I had trouble getting BigChef's answer running, here is a variation that works on python 3.6:

    # Imports
    import pylab
    import scipy
    import sklearn.cluster
    
    # Set up original data - note that it's monotonically increasing by X value!
    data = {}
    data['original'] = {}
    data['original']['x'] = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
    data['original']['y'] = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
    
    # Cluster data, sort it and and save
    import numpy
    inputNumpy = numpy.array([[data['original']['x'][i], data['original']['y'][i]] for i in range(0, len(data['original']['x']))])
    meanShift = sklearn.cluster.MeanShift()
    meanShift.fit(inputNumpy)
    clusteredData = [[pair[0], pair[1]] for pair in meanShift.cluster_centers_]
    
    clusteredData.sort(key=lambda li: li[0])
    data['clustered'] = {}
    data['clustered']['x'] = [pair[0] for pair in clusteredData]
    data['clustered']['y'] = [pair[1] for pair in clusteredData]
    
    # Build a spline using the clustered data and predict
    mySpline = scipy.interpolate.UnivariateSpline(x=data['clustered']['x'], y=data['clustered']['y'], k=1)
    xi = range(0, int(round(max(data['original']['x']), -3)) + 3000, 20)
    yi = mySpline(xi)
    
    # Plot the datapoints
    pylab.plot(data['clustered']['x'], data['clustered']['y'], "D", label="Datapoints (%s)" % 'clustered')
    pylab.plot(xi, yi, label="Predicted (%s)" %  'clustered')
    pylab.plot(data['original']['x'], data['original']['y'], "o", label="Datapoints (%s)" % 'original')
    
    # Show the plot
    pylab.grid(True)
    pylab.xticks(rotation=45)
    pylab.show()
    
    0 讨论(0)
  • 2021-01-01 05:22

    @Zhenya's answer of manually setting knots in between datapoints was too rough to deliver good results in noisy data without being selective about how this technique is applied. However, inspired by his/her suggestion, I have had success with Mean-Shift clustering from the scikit-learn package. It performs auto-determination of the cluster count and seems to do a fairly good smoothing job (very smooth in fact).

    # Imports
    import numpy
    import pylab
    import scipy
    import sklearn.cluster
    
    # Set up original data - note that it's monotonically increasing by X value!
    data = {}
    data['original'] = {}
    data['original']['x'] = [0, 5024.2059124920379, 7933.1645067836089, 7990.4664106277542, 9879.9717114947653, 13738.60563208926, 15113.277958924193]
    data['original']['y'] = [0.0, 3072.5653360000988, 5477.2689107965398, 5851.6866463790966, 6056.3852496014106, 7895.2332350173638, 9154.2956175610598]
    
    # Cluster data, sort it and and save
    inputNumpy = numpy.array([[data['original']['x'][i], data['original']['y'][i]] for i in range(0, len(data['original']['x']))])
    meanShift = sklearn.cluster.MeanShift()
    meanShift.fit(inputNumpy)
    clusteredData = [[pair[0], pair[1]] for pair in meanShift.cluster_centers_]
    clusteredData.sort(lambda pair1, pair2: cmp(pair1[0],pair2[0]))
    data['clustered'] = {}
    data['clustered']['x'] = [pair[0] for pair in clusteredData]
    data['clustered']['y'] = [pair[1] for pair in clusteredData]
    
    # Build a spline using the clustered data and predict
    mySpline = scipy.interpolate.UnivariateSpline(x=data['clustered']['x'], y=data['clustered']['y'], k=1)
    xi = range(0, round(max(data['original']['x']), -3) + 3000, 20)
    yi = mySpline(xi)
    
    # Plot the datapoints
    pylab.plot(data['clustered']['x'], data['clustered']['y'], "D", label="Datapoints (%s)" % 'clustered')
    pylab.plot(xi, yi, label="Predicted (%s)" %  'clustered')
    pylab.plot(data['original']['x'], data['original']['y'], "o", label="Datapoints (%s)" % 'original')
    
    # Show the plot
    pylab.grid(True)
    pylab.xticks(rotation=45)
    pylab.legend( loc="lower right" )
    pylab.show()
    

    enter image description here

    0 讨论(0)
提交回复
热议问题