Statsmodels with partly identified model

问题

I am trying to run a regression where only some of the coefficients can be identified:

data = np.array([[2, 1, 1, 1], [1, 1, 1, 0]])
df = pd.DataFrame(data, columns=['y', 'x1', 'x2', 'x3'])
z = df.pop('y')
mod = sm.OLS(z, sm.add_constant(df))

Now, I have two outcomes, and the only variables that changes between the two observations is x3. So, I would expect that (since I added a constant), the model would be unable to identify x1 or x2, and would omit those. It should however give me a 1 for x3, since the presence of that effect increases y by one.

Stata does exactly give me this outcome, and it reminds me that it cannot estimate a standard error on the coefficient for x3. statsmodels, on the other hand...

res = mod.fit()
res.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                       nan
Date:                Sun, 30 Aug 2020   Prob (F-statistic):                nan
Time:                        14:28:28   Log-Likelihood:                 66.947
No. Observations:                   2   AIC:                            -129.9
Df Residuals:                       0   BIC:                            -132.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.5000        inf          0        nan         nan         nan
x2             0.5000        inf          0        nan         nan         nan
x3             1.0000        inf          0        nan         nan         nan
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.200
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.333
Skew:                           0.000   Prob(JB):                        0.846
Kurtosis:                       1.000   Cond. No.                         3.23
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The input rank is higher than the number of observations.
"""

What is happening here? And how can I get my expected output?

回答1:

statsmodels uses the Moore-Penrose generalized inverse pinv to estimate the parameters in linear regression model, OLS. WLS, GLS.

So, it provides a regularized solution if the design matrix is singular.

The covariance matrix of the parameter estimate has reduced rank, and only some linear combinations of parameters will be identified.

However, the model can be used for prediction, if the linear relationship in the data remains the same in prediction samples.

来源：https://stackoverflow.com/questions/63657458/statsmodels-with-partly-identified-model

标签

python

statsmodels