R: t test over multiple columns using t.test function

后端未结

关注

 4  1283

I tried to perform independent t-test for many columns of a dataframe. For example, i created a data frame

set seed(333)
a <- rnorm(20, 10, 1)
b <- rno


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南方客        
                
              
                            
                2021-01-19 14:53
              
            
            
                                                                       
Use select_if to select only numeric columns then use purrr:map_df to apply t.test against grp. Finally use broom:tidy to get the results in tidy format

library(tidyverse)

res <- test_data %>% 
  select_if(is.numeric) %>%
  map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
#> # A tibble: 3 x 11
#>   var   estimate estimate1 estimate2 statistic p.value parameter conf.low
#>   <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
#> 1 a       -0.259      9.78      10.0    -0.587   0.565      16.2    -1.19
#> 2 b        0.154     15.0       14.8     0.169   0.868      15.4    -1.78
#> 3 c       -0.359     20.4       20.7    -0.287   0.778      16.5    -3.00
#> # ... with 3 more variables: conf.high <dbl>, method <chr>,
#> #   alternative <chr>


^{Created on 2019-03-15 by the reprex package (v0.2.1.9000)}
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2021-01-19 14:57
              
            
            
                                                                       
Simply extract the estimate and p-value results from t.test call while iterating through all needed columns with sapply. Build formulas from a character vector and transpose with t() for output:

formulas <- paste(names(test_data)[1:(ncol(test_data)-1)], "~ grp")

output <- t(sapply(formulas, function(f) {      
  res <- t.test(as.formula(f))
  c(res$estimate, p.value=res$p.value)      
}))


Input data (seeded for reproducibility)

set.seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)


Output result

#         mean in group m mean in group y   p.value
# a ~ grp        9.775477        10.03419 0.5654353
# b ~ grp       14.972888        14.81895 0.8678149
# c ~ grp       20.383679        20.74238 0.7776188

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2021-01-19 15:13
              
            
            
                                                                       
Using lapply this is rather easy.

I have tested the code with set.seed(7060) before creating the dataset, in order to make the results reproducible.

tests_list <- lapply(letters[1:3], function(x) t.test(as.formula(paste0(x, "~ grp")), data = test_data))

result <- do.call(rbind, lapply(tests_list, `[[`, "estimate"))
pval <- sapply(tests_list, `[[`, "p.value")
result <- cbind(result, p.value = pval)

result
#     mean in group m mean in group y   p.value
#[1,]        9.909818        9.658813 0.6167742
#[2,]       14.578926       14.168816 0.6462151
#[3,]       20.682587       19.299133 0.2735725


Note that a real life application would use names(test_data)[1:3], not letters[1:3], in the first lapply instruction.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2021-01-19 15:16
              
            
            
                                                                       
As you asked for a for loop:

  a <- rnorm(20, 10, 1)
  b <- rnorm(20, 15, 2)
  c <- rnorm(20, 20, 3)
  grp <- rep(c('m', 'y'),10)
  test_data <- data.frame(a, b, c, grp)  

  meanM=NULL
  meanY=NULL
  p.value=NULL

  for (i in 1:(ncol(test_data)-1)){
    meanM=as.data.frame(rbind(meanM, t.test(test_data[,i] ~ grp)$estimate[1]))
    meanY=as.data.frame(rbind(meanY, t.test(test_data[,i] ~ grp)$estimate[2]))
    p.value=as.data.frame(rbind(p.value, t.test(test_data[,i] ~ grp)$p.value))
   }

  cbind(meanM, meanY, p.value)


It works, but I am a beginner in R. So maybe there is a more efficient solution
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复