What is first value that is passed into StatsModels predict function?

六月ゝ 毕业季﹏ 提交于 2020-01-25 22:04:22

问题


I have the following OLS model from StatsModels:

X = df['Grade']
y = df['Results']

X = statsmodels.tools.tools.add_constant(X)

mod = sm.OLS(y,X)
results = mod.fit() 

When trying to predict a new Y value for an X value of 4, I have to pass the following:

results.predict([1,4])

I don't understand why an array with the first value being '1' needs to be passed in order for the predict function to work correctly. Why do I need to include a 1 instead of just saying:

results.predict([4])

I'm not clear on the concept at work here. Does anybody know what's going on?


回答1:


You are adding a constant to the regression equation with X = statsmodels.tools.tools.add_constant(X). So your regressor X has two columns where the first column is a array of ones.

You need to do the same with the regressor that is used in prediction. So, the 1 means include the constant in the prediction. If you use zero instead, then the contribution of the constant (0 * params[0]) is zero and the prediction is only the slope effect.

The formula interface adds the constant automatically both for the regressor in the model and for the regressor in the prediction. However, with the pandas DataFrame or numpy ndarray interface, the constant needs to be added by the user both for the model and for predict.



来源:https://stackoverflow.com/questions/39714057/what-is-first-value-that-is-passed-into-statsmodels-predict-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!