问题
I am trying to run a regression where only some of the coefficients can be identified:
data = np.array([[2, 1, 1, 1], [1, 1, 1, 0]])
df = pd.DataFrame(data, columns=['y', 'x1', 'x2', 'x3'])
z = df.pop('y')
mod = sm.OLS(z, sm.add_constant(df))
Now, I have two outcomes, and the only variables that changes between the two observations is x3
. So, I would expect that (since I added a constant), the model would be unable to identify x1
or x2
, and would omit those. It should however give me a 1
for x3
, since the presence of that effect increases y
by one.
Stata does exactly give me this outcome, and it reminds me that it cannot estimate a standard error on the coefficient for x3
. statsmodels
, on the other hand...
res = mod.fit()
res.summary()
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Sun, 30 Aug 2020 Prob (F-statistic): nan
Time: 14:28:28 Log-Likelihood: 66.947
No. Observations: 2 AIC: -129.9
Df Residuals: 0 BIC: -132.5
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.5000 inf 0 nan nan nan
x2 0.5000 inf 0 nan nan nan
x3 1.0000 inf 0 nan nan nan
==============================================================================
Omnibus: nan Durbin-Watson: 0.200
Prob(Omnibus): nan Jarque-Bera (JB): 0.333
Skew: 0.000 Prob(JB): 0.846
Kurtosis: 1.000 Cond. No. 3.23
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
"""
What is happening here? And how can I get my expected output?
回答1:
statsmodels uses the Moore-Penrose generalized inverse pinv
to estimate the parameters in linear regression model, OLS. WLS, GLS.
So, it provides a regularized solution if the design matrix is singular.
The covariance matrix of the parameter estimate has reduced rank, and only some linear combinations of parameters will be identified.
However, the model can be used for prediction, if the linear relationship in the data remains the same in prediction samples.
来源:https://stackoverflow.com/questions/63657458/statsmodels-with-partly-identified-model