Polynomial Regression values generated too far from the coordinates

安稳与你 提交于 2021-02-10 06:38:20

问题


As per the the below code for Polynomial Regression coefficients value, when I calculate the regression value at any x point. Value obtained is way more away from the equivalent y coordinate (specially for the below coordinates). Can anyone explain why the difference is so high, can this be minimized or any flaw in understanding. The current requirement is not a difference of more 150 at every point.


import  numpy as np
x=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y=[0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]                                                                                                     
z=np.polyfit(x,y,3)
print(z) 

I have also tried various various codes available in java, but the coefficient values are same every where for this data. Please help with the understanding.
For example


0.019168 * N^3 + -5.540901 * N^2 + 579.846493 * N + -1119.339450
N equals 5 Value equals 1643.76649
Y value 885
N equals 10 Value equals 4144.20338
Y value 3517
N equals 100; Value=20624.29985
Y value 20746

回答1:


The polynomial fit performs as expected. There is no error here, just a great deviation in your data. You might want to rescale your data though. If you add the parameter full=True to np.polyfit, you will receive additional information, including the residuals which essentially is the sum of the square fit errors. See this other SO post for more details.

import matplotlib.pyplot as plt
import  numpy as np

x = [0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100]
y = [0,885,3517,5935,8137,11897,10125,13455,14797,15925,16837,17535,18017,18285,18328,18914,19432,19879,20249,20539,20746]

m = max(y)
y = [p/m for p in y] # rescaled y such that max(y)=1, and dimensionless

z, residuals, rank, sing_vals, cond_thres = np.polyfit(x,y,3,full=True)

print("Z: ",z) # [ 9.23914285e-07 -2.67082878e-04  2.79497972e-02 -5.39544708e-02]

print("resi:", residuals) # 0.02188 : quite decent, depending on WHAT you're measuring ..

Z = [z[3] + q*z[2] +  q*q*z[1] + q*q*q*z[0] for q in x]

fig = plt.figure()
ax = fig.add_subplot(111)

ax.scatter(x,y)
ax.plot(x,Z,'r')
plt.show()




回答2:


After I reviewed the answer of @Magnus, I reduced the limits used for the data in a 3rd order polynomial. As you can see, the points within my crudely drawn red circle cannot both lie on a smooth line with the nearby data. While I could fit smooth lines such as a Hill sigmoidal equation through the data, the data variance (noise) itself appears to be the limiting factor in achieving a peak absolute error of 150 with this data set.



来源:https://stackoverflow.com/questions/59676871/polynomial-regression-values-generated-too-far-from-the-coordinates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!