R: t test over multiple columns using t.test function

后端 未结 4 1271
梦如初夏
梦如初夏 2021-01-19 14:29

I tried to perform independent t-test for many columns of a dataframe. For example, i created a data frame

set seed(333)
a <- rnorm(20, 10, 1)
b <- rno         


        
相关标签:
4条回答
  • 2021-01-19 14:53

    Use select_if to select only numeric columns then use purrr:map_df to apply t.test against grp. Finally use broom:tidy to get the results in tidy format

    library(tidyverse)
    
    res <- test_data %>% 
      select_if(is.numeric) %>%
      map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
    res
    #> # A tibble: 3 x 11
    #>   var   estimate estimate1 estimate2 statistic p.value parameter conf.low
    #>   <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
    #> 1 a       -0.259      9.78      10.0    -0.587   0.565      16.2    -1.19
    #> 2 b        0.154     15.0       14.8     0.169   0.868      15.4    -1.78
    #> 3 c       -0.359     20.4       20.7    -0.287   0.778      16.5    -3.00
    #> # ... with 3 more variables: conf.high <dbl>, method <chr>,
    #> #   alternative <chr>
    

    Created on 2019-03-15 by the reprex package (v0.2.1.9000)

    0 讨论(0)
  • 2021-01-19 14:57

    Simply extract the estimate and p-value results from t.test call while iterating through all needed columns with sapply. Build formulas from a character vector and transpose with t() for output:

    formulas <- paste(names(test_data)[1:(ncol(test_data)-1)], "~ grp")
    
    output <- t(sapply(formulas, function(f) {      
      res <- t.test(as.formula(f))
      c(res$estimate, p.value=res$p.value)      
    }))
    

    Input data (seeded for reproducibility)

    set.seed(333)
    a <- rnorm(20, 10, 1)
    b <- rnorm(20, 15, 2)
    c <- rnorm(20, 20, 3)
    grp <- rep(c('m', 'y'),10)
    test_data <- data.frame(a, b, c, grp)
    

    Output result

    #         mean in group m mean in group y   p.value
    # a ~ grp        9.775477        10.03419 0.5654353
    # b ~ grp       14.972888        14.81895 0.8678149
    # c ~ grp       20.383679        20.74238 0.7776188
    
    0 讨论(0)
  • 2021-01-19 15:13

    Using lapply this is rather easy.
    I have tested the code with set.seed(7060) before creating the dataset, in order to make the results reproducible.

    tests_list <- lapply(letters[1:3], function(x) t.test(as.formula(paste0(x, "~ grp")), data = test_data))
    
    result <- do.call(rbind, lapply(tests_list, `[[`, "estimate"))
    pval <- sapply(tests_list, `[[`, "p.value")
    result <- cbind(result, p.value = pval)
    
    result
    #     mean in group m mean in group y   p.value
    #[1,]        9.909818        9.658813 0.6167742
    #[2,]       14.578926       14.168816 0.6462151
    #[3,]       20.682587       19.299133 0.2735725
    

    Note that a real life application would use names(test_data)[1:3], not letters[1:3], in the first lapply instruction.

    0 讨论(0)
  • 2021-01-19 15:16

    As you asked for a for loop:

      a <- rnorm(20, 10, 1)
      b <- rnorm(20, 15, 2)
      c <- rnorm(20, 20, 3)
      grp <- rep(c('m', 'y'),10)
      test_data <- data.frame(a, b, c, grp)  
    
      meanM=NULL
      meanY=NULL
      p.value=NULL
    
      for (i in 1:(ncol(test_data)-1)){
        meanM=as.data.frame(rbind(meanM, t.test(test_data[,i] ~ grp)$estimate[1]))
        meanY=as.data.frame(rbind(meanY, t.test(test_data[,i] ~ grp)$estimate[2]))
        p.value=as.data.frame(rbind(p.value, t.test(test_data[,i] ~ grp)$p.value))
       }
    
      cbind(meanM, meanY, p.value)
    

    It works, but I am a beginner in R. So maybe there is a more efficient solution

    0 讨论(0)
提交回复
热议问题