Multiple linear regression in pandas statsmodels: ValueError

后端 未结 1 960
广开言路
广开言路 2021-01-02 20:26

Data: https://courses.edx.org/c4x/MITx/15.071x_2/asset/NBA_train.csv

I know how to fit these data to a multiple linear regression model using statsmodels.formu

相关标签:
1条回答
  • 2021-01-02 20:56

    When using sm.OLS(y, X), y is the dependent variable, and X are the independent variables.

    In the formula W ~ PTS + oppPTS, W is the dependent variable and PTS and oppPTS are the independent variables.

    Therefore, use

    y = NBA['W']
    X = NBA[['PTS', 'oppPTS']]
    

    instead of

    X = NBA['W']
    y = NBA[['PTS', 'oppPTS']]
    

    import pandas as pd
    import statsmodels.api as sm
    
    NBA = pd.read_csv("NBA_train.csv")    
    y = NBA['W']
    X = NBA[['PTS', 'oppPTS']]
    X = sm.add_constant(X)
    model11 = sm.OLS(y, X).fit()
    model11.summary()
    

    yields

                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                      W   R-squared:                       0.942
    Model:                            OLS   Adj. R-squared:                  0.942
    Method:                 Least Squares   F-statistic:                     6799.
    Date:                Sat, 21 Mar 2015   Prob (F-statistic):               0.00
    Time:                        14:58:05   Log-Likelihood:                -2118.0
    No. Observations:                 835   AIC:                             4242.
    Df Residuals:                     832   BIC:                             4256.
    Df Model:                           2                                         
    Covariance Type:            nonrobust                                         
    ==============================================================================
                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
    ------------------------------------------------------------------------------
    const         41.3048      1.610     25.652      0.000        38.144    44.465
    PTS            0.0326      0.000    109.600      0.000         0.032     0.033
    oppPTS        -0.0326      0.000   -110.951      0.000        -0.033    -0.032
    ==============================================================================
    Omnibus:                        1.026   Durbin-Watson:                   2.238
    Prob(Omnibus):                  0.599   Jarque-Bera (JB):                0.984
    Skew:                           0.084   Prob(JB):                        0.612
    Kurtosis:                       3.009   Cond. No.                     1.80e+05
    ==============================================================================
    
    Warnings:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    [2] The condition number is large, 1.8e+05. This might indicate that there are
    strong multicollinearity or other numerical problems.
    
    0 讨论(0)
提交回复
热议问题