Python scikit learn Linear Model Parameter Standard Error

前端未结

关注

 2  574

I am working with sklearn and specifically the linear_model module. After fitting a simple linear as in

import pandas as pd
import numpy as np
from sklearn impo


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2021-02-18 19:28
              
            
            
                                                                       
tl;dr

not with scikit-learn, but you can compute this manually with some linear algebra. i do this for your example below. 

also here's a jupyter notebook with this code: https://gist.github.com/grisaitis/cf481034bb413a14d3ea851dab201d31

what and why

the standard errors of your estimates are just the square root of the variances of your estimates. what's the variance of your estimate? if you assume your model has gaussian error, it's:

Var(beta_hat) = inverse(X.T @ X) * sigma_squared_hat

and then the standard error of beta_hat[i] is Var(beta_hat)[i, i] ** 0.5. 

All you have to compute sigma_squared_hat. This is the estimate of your model's gaussian error. This is not known a priori but can be estimated with the sample variance of your residuals.

Also you need to add an intercept term to your data matrix. Scikit-learn does this automatically with the LinearRegression class. So to compute this yourself you need to add that to your X matrix or dataframe. 

how

Starting after your code,

show your scikit-learn results

print(model.intercept_)
print(model.coef_)


[-0.28671532]
[[ 0.17501115 -0.6928708   0.22336584]]


reproduce this with linear algebra

N = len(X)
p = len(X.columns) + 1  # plus one because LinearRegression adds an intercept term

X_with_intercept = np.empty(shape=(N, p), dtype=np.float)
X_with_intercept[:, 0] = 1
X_with_intercept[:, 1:p] = X.values

beta_hat = np.linalg.inv(X_with_intercept.T @ X_with_intercept) @ X_with_intercept.T @ y.values
print(beta_hat)


[[-0.28671532]
 [ 0.17501115]
 [-0.6928708 ]
 [ 0.22336584]]


compute standard errors of the parameter estimates

y_hat = model.predict(X)
residuals = y.values - y_hat
residual_sum_of_squares = residuals.T @ residuals
sigma_squared_hat = residual_sum_of_squares[0, 0] / (N - p)
var_beta_hat = np.linalg.inv(X_with_intercept.T @ X_with_intercept) * sigma_squared_hat
for p_ in range(p):
    standard_error = var_beta_hat[p_, p_] ** 0.5
    print(f"SE(beta_hat[{p_}]): {standard_error}")


SE(beta_hat[0]): 0.2468580488280805
SE(beta_hat[1]): 0.2965501221823944
SE(beta_hat[2]): 0.3518847753610169
SE(beta_hat[3]): 0.3250760291745124


confirm with statsmodels

import statsmodels.api as sm
ols = sm.OLS(y.values, X_with_intercept)
ols_result = ols.fit()
ols_result.summary()


...
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.2867      0.247     -1.161      0.290      -0.891       0.317
x1             0.1750      0.297      0.590      0.577      -0.551       0.901
x2            -0.6929      0.352     -1.969      0.096      -1.554       0.168
x3             0.2234      0.325      0.687      0.518      -0.572       1.019
==============================================================================


yay, done!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2021-02-18 19:28
              
            
            
                                                                       
No, scikit-learn does not have built error estimates for doing inference. Statsmodels does though.

import statsmodels.api as sm
ols = sm.OLS(y, X)
ols_result = ols.fit()
# Now you have at your disposition several error estimates, e.g.
ols_result.HC0_se
# and covariance estimates
ols_result.cov_HC0


see docs
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复