ANOVA for groups within a dataframe using scipy

后端 未结 1 1135
悲&欢浪女
悲&欢浪女 2021-02-03 12:08

I have a dataframe as follows. I need to do ANOVA on this between three conditions. The dataframe looks like:

data0 = pd.DataFrame({\'Names\': [\'CTA15\', \'CTA1         


        
相关标签:
1条回答
  • 2021-02-03 12:44

    Consider the following sample DataFrame:

    df = pd.DataFrame({'Names': np.random.randint(1, 10, 1000), 
                       'value': np.random.randn(1000), 
                       'condition': np.random.choice(['NON', 'YES', 'RE'], 1000)})
    
    df.head()
    Out: 
       Names condition     value
    0      4        RE  0.844120
    1      4       NON -0.440285
    2      5       YES  0.559497
    3      4        RE  0.472425
    4      9       YES  0.205906
    

    The following groups the DataFrame by Names, and then passes each condition group to ANOVA:

    import scipy.stats as ss
    for name_group in df.groupby('Names'):
        samples = [condition[1] for condition in name_group[1].groupby('condition')['value']]
        f_val, p_val = ss.f_oneway(*samples)
        print('Name: {}, F value: {:.3f}, p value: {:.3f}'.format(name_group[0], f_val, p_val))
    
    Name: 1, F value: 0.138, p value: 0.871
    Name: 2, F value: 1.458, p value: 0.237
    Name: 3, F value: 0.742, p value: 0.479
    Name: 4, F value: 2.718, p value: 0.071
    Name: 5, F value: 0.255, p value: 0.776
    Name: 6, F value: 1.731, p value: 0.182
    Name: 7, F value: 0.269, p value: 0.764
    Name: 8, F value: 0.474, p value: 0.624
    Name: 9, F value: 1.226, p value: 0.297
    

    For post-hoc tests, you can use statsmodels (as explained here):

    from statsmodels.stats.multicomp import pairwise_tukeyhsd
    for name, grouped_df in df.groupby('Names'):
        print('Name {}'.format(name), pairwise_tukeyhsd(grouped_df['value'], grouped_df['condition']))
    
    Name 1 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE    0.0086  -0.5129 0.5301 False 
     NON    YES    0.0084  -0.4817 0.4986 False 
      RE    YES   -0.0002  -0.5217 0.5214 False 
    --------------------------------------------
    Name 2 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE   -0.0089  -0.5299 0.5121 False 
     NON    YES    0.083   -0.4182 0.5842 False 
      RE    YES    0.0919  -0.4008 0.5846 False 
    --------------------------------------------
    Name 3 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE    0.2401  -0.3136 0.7938 False 
     NON    YES    0.2765  -0.2903 0.8432 False 
      RE    YES    0.0364  -0.5052 0.578  False 
    --------------------------------------------
    Name 4 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE    0.0894  -0.5825 0.7613 False 
     NON    YES   -0.0437  -0.7418 0.6544 False 
      RE    YES   -0.1331  -0.6949 0.4287 False 
    --------------------------------------------
    Name 5 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE   -0.4264  -0.9495 0.0967 False 
     NON    YES    0.0439  -0.4264 0.5142 False 
      RE    YES    0.4703  -0.0155 0.9561 False 
    --------------------------------------------
    Name 6 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE    0.0649  -0.4971 0.627  False 
     NON    YES    -0.406  -0.9405 0.1285 False 
      RE    YES   -0.4709  -1.0136 0.0717 False 
    --------------------------------------------
    Name 7 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE    0.3111  -0.2766 0.8988 False 
     NON    YES   -0.1664  -0.7314 0.3987 False 
      RE    YES   -0.4774  -1.0688 0.114  False 
    --------------------------------------------
    Name 8 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE   -0.0224   -0.668 0.6233 False 
     NON    YES    0.0119   -0.668 0.6918 False 
      RE    YES    0.0343  -0.6057 0.6742 False 
    --------------------------------------------
    Name 9 Multiple Comparison of Means - Tukey HSD,FWER=0.05
    ============================================
    group1 group2 meandiff  lower  upper  reject
    --------------------------------------------
     NON     RE   -0.2414  -0.7792 0.2963 False 
     NON    YES    0.0696  -0.5746 0.7138 False 
      RE    YES    0.311   -0.3129 0.935  False 
    
    0 讨论(0)
提交回复
热议问题