How do I do a F-test in python

后端未结

关注

 4  1490

How do I do an F-test to check if the variance is equivalent in two vectors in Python?

For example if I have

a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2020-12-23 17:18
              
            
            
                                                                       
if you need a two-tailed test, you can proceed as follow, i choosed alpha =0.05:  

a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = np.var(a, ddof=1)/np.var(b, ddof=1) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = 2*min(fdistribution.cdf(f_critical), 1-fdistribution.cdf(f_critical))
f_critical1 = fdistribution.ppf(0.025)
f_critical2 = fdistribution.ppf(0.975)
print(fstatistics,f_critical1, f_critical2 )
if (p_value<0.05):
    print('Reject H0', p_value)
else:
    print('Cant Reject H0', p_value)


if you want to proceed to an ANOVA like test where only large values can cause rejection, you can proceed to right-tail test, you need to pay attention to the order of variances (fstatistics = var1/var2 or var2/var1): 

a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = max(np.var(a, ddof=1), np.var(b, ddof=1))/min(np.var(a, ddof=1), np.var(b, ddof=1)) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object 
p_value = 1-fdistribution.cdf(fstatistics)
f_critical = fd.ppf(0.95)
print(fstatistics, f_critical)
if (p_value<0.05):
    print('Reject H0', p_value)
else:
    print('Cant Reject H0', p_value)


The left-tailed can be done as follow : 

a = [1,2,1,2,1,2,1,2,1,2]
b = [1,3,-1,2,1,5,-1,6,-1,2]
print('Variance a={0:.3f}, Variance b={1:.3f}'.format(np.var(a, ddof=1), np.var(b, ddof=1)))
fstatistics = min(np.var(a, ddof=1), np.var(b, ddof=1))/max(np.var(a, ddof=1), np.var(b, ddof=1)) # because we estimate mean from data
fdistribution = stats.f(len(a)-1,len(b)-1) # build an F-distribution object
p_value = fdistribution.cdf(fstatistics)
f_critical = fd.ppf(0.05)
print(fstatistics, f_critical)
if (p_value<0.05):
    print('Reject H0', p_value)
else:
    print('Cant Reject H0', p_value)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2020-12-23 17:33
              
            
            
                                                                       
The test statistic F test for equal variances is simply:

F = Var(X) / Var(Y)


Where F is distributed as df1 = len(X) - 1, df2 = len(Y) - 1

scipy.stats.f which you mentioned in your question has a CDF method. This means you can generate a p-value for the given statistic and test whether that p-value is greater than your chosen alpha level.

Thus:

alpha = 0.05 #Or whatever you want your alpha to be.
p_value = scipy.stats.f.cdf(F, df1, df2)
if p_value > alpha:
    # Reject the null hypothesis that Var(X) == Var(Y)


Note that the F-test is extremely sensitive to non-normality of X and Y, so you're probably better off doing a more robust test such as Levene's test or Bartlett's test unless you're reasonably sure that X and Y are distributed normally. These tests can be found in the scipy api:


Bartlett's test
Levene's test

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2020-12-23 17:35
              
            
            
                                                                       
To do a one way anova you can use

import scipy.stats as stats

stats.f_oneway(a,b)


One way Anova checks if the variance between the groups is greater then the variance within groups, and computes the probability of observing this variance ratio using F-distribution. A good tutorial can be found here:

https://www.khanacademy.org/math/probability/statistics-inferential/anova/v/anova-1-calculating-sst-total-sum-of-squares
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-12-23 17:38
              
            
            
                                                                       
For anyone who came here searching for an ANOVA F-test or to compare between models for feature selection


sklearn.feature_selection.f_classif does ANOVA tests, and
sklearn.feature_selection.f_regression does sequential testing of regressions

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复