问题
As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point.
import numpy as np
x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y=[0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]
z=np.polyfit(x,y,3)
print(z)
I have also tried various various codes available in java, but the coefficient values are same every where for this data. Please help with the understanding.
For example
0.019168 * N^3 + -5.540901 * N^2 + 579.846493 * N + -1119.339450
N equals 5 Value equals 1643.76649Y value 885
N equals 10 Value equals 4144.20338
Y value 3517
N equals 100; Value=20624.29985
Y value 20746
回答1:
The polynomial fit performs as expected. There is no error here, just a great deviation in your data. You might want to rescale your data though. If you add the parameter full=True
to np.polyfit
, you will receive additional information, including the residuals which essentially is the sum of the square fit errors. See this other SO post for more details.
import matplotlib.pyplot as plt
import numpy as np
x = [0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y = [0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]
m = max(y)
y = [p/m for p in y] # rescaled y such that max(y)=1, and dimensionless
z, residuals, rank, sing_vals, cond_thres = np.polyfit(x,y,3,full=True)
print("Z: ",z) # [ 9.23914285e-07 -2.67082878e-04 2.79497972e-02 -5.39544708e-02]
print("resi:", residuals) # 0.02188 : quite decent, depending on WHAT you're measuring ..
Z = [z[3] + q*z[2] + q*q*z[1] + q*q*q*z[0] for q in x]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x,y)
ax.plot(x,Z,'r')
plt.show()
回答2:
After I reviewed the answer of @Magnus, I reduced the limits used for the data in a 3rd order polynomial. As you can see, the points within my crudely drawn red circle cannot both lie on a smooth line with the nearby data. While I could fit smooth lines such as a Hill sigmoidal equation through the data, the data variance (noise) itself appears to be the limiting factor in achieving a peak absolute error of 150 with this data set.
来源:https://stackoverflow.com/questions/59676871/polynomial-regression-values-generated-too-far-from-the-coordinates