R: how can I create a table with mean and sd according to experimental group alongside p-values?

前端 未结 4 1659
心在旅途
心在旅途 2021-01-03 09:15

I know how I can do all that for individual variables but I need to report this information for a large number of variables and would like to know if there is an efficient w

相关标签:
4条回答
  • 2021-01-03 09:34

    First let's make some example data. For each sample, we have a unique ID, its experimental group, and some variables for which we want to calculate the mean and SD.

    ## Make a data frame called "Data" with five columns
    Data <- as.data.frame(cbind(1:100, sample(1:2), rnorm(100), rnorm(100), rnorm(100), rnorm(100)))
    names(Data) <- c("ID", "Group", "V1", "V2", "V3", "V4")
    
    ## Now we will take a peak at the top of our data frame
    > head(Data)
    
      ID Group         V1         V2         V3         V4
    1  1     2  0.3681539 -0.5008400  1.2060665 -0.7352376
    2  2     1 -0.1043180  2.2038190 -1.4367898  2.1961246
    3  3     2 -0.2720279 -0.5923554 -1.4628190 -1.8776453
    4  4     1 -2.3299662 -0.1216227  0.4200776  1.5504020
    5  5     2 -0.3670578 -1.5903221 -0.6287083 -1.0543262
    6  6     1  0.4840047 -0.3181554 -1.4596980 -0.4261827
    

    Now we can run a for loop through the variables, pull their means and SDs and p values, and dump them all in an object called "Results".

    ## Define object which will receive our results
    Results <- NULL
    Results <- as.data.frame(Results)
    
    ## Open for loop
    for (i in 3:6) {
    
    ## Run the t.test() and save it in a temporary object
    temp <- t.test(Data[which(Data$Group == 1), i], Data[which(Data$Group == 2), i])
    
    ## Put the name of our variable in our results object
    Results[i-2,1] <- names(Data)[i]
    
    ## Group 1 mean and SD
    Results[i-2,2] <- temp$estimate[1]
    Results[i-2,3] <- sd(Data[which(Data$Group == 1), i])
    
    ## Group 2 mean and SD
    Results[i-2,4] <- temp$estimate[2]
    Results[i-2,5] <- sd(Data[which(Data$Group == 2), i])
    
    ## P value for difference
    Results[i-2,6] <- temp$p.value
    
    rm(temp)
    }
    

    Now we can make our results pretty and print them.

    ## Add column names
    names(Results) <- c("Variable", "Group.1.Mean", "Group.1.SD", "Group.2.Mean", "Group.2.SD", "P.Value")
    
    ## View our results
    > Results
    
      Variable Group.1.Mean Group.1.SD Group.2.Mean Group.2.SD   P.Value
    1       V1   0.21544390  0.9404104  -0.01426226  1.0570324 0.2537820
    2       V2   0.26287585  1.0048291   0.22992285  0.9709686 0.8679038
    3       V3  -0.06112963  0.9855287   0.17423440  1.0198694 0.2434507
    4       V4   0.33848678  0.9360016   0.07905932  0.9106595 0.1632705
    
    0 讨论(0)
  • 2021-01-03 09:40

    In a data object like that offered by Alexander:

     aggregate( . ~ Group, FUN=function(x) c(mn=mean(x), sd=sd(x)), data=Data[-1])
    # Output
      Group       V1.mn       V1.sd       V2.mn       V2.sd
    1     1  0.05336901  0.85468837  0.06833691  0.94459083
    2     2 -0.01658412  0.97583110 -0.02940477  1.11880398
           V3.mn      V3.sd       V4.mn       V4.sd
    1 -0.2096497  1.1732246  0.08850199  0.98906102
    2  0.0674267  0.8848818 -0.11485148  0.90554914
    

    The data argument omits the ID column because you only want the results on the data columns. The request for a collection of p-values can be accomplished with:

     sapply(names(Data)[-(1:2)], function(x) c( 
                       Mean.Grp1 = mean(Data[Data$Group==1,x]), 
                       Mean.Grp2 = mean(Data[Data$Group==2,x]), 
                       `p-value`= t.test(Data[Data$Group==1, x], 
                                         Data[Data$Group==2,x])$p.value )
              )
    #---------------------------
                       V1          V2         V3          V4
    Mean.Grp1  0.05336901  0.06833691 -0.2096497  0.08850199
    Mean.Grp2 -0.01658412 -0.02940477  0.0674267 -0.11485148
    p-value    0.70380932  0.63799544  0.1857743  0.28624585
    

    If you wanted to add the SD's to that output the strategy seems obvious. You should note the back-quoting of the "p-value" name. Minus signs are syntactically "active" and would get interpreted as functions if not enclosed in quotes.

    0 讨论(0)
  • 2021-01-03 09:51

    Well the code proposed does not work unless you transpose table with p-values.

    0 讨论(0)
  • 2021-01-03 09:53

    The tables package makes everything in this except the p-values easy, and the p-values are doable. Here is a quick example:

    > library(tables)
    > iris2 <- iris[ iris$Species != 'versicolor', ]
    > iris2$Species <- factor(iris2$Species)
    > tmp <- tabular( Petal.Width+Petal.Length + Sepal.Width+Sepal.Length ~ Species* (mean+sd), data=iris2 )
    > 
    > tmp.p <- sapply( names(iris2)[1:4], function(x) t.test( iris2[[x]] ~ iris2$Species )$p.value )
    > 
    > tmp
    
                  setosa        virginica       
                  mean   sd     mean      sd    
     Petal.Width  0.246  0.1054 2.026     0.2747
     Petal.Length 1.462  0.1737 5.552     0.5519
     Sepal.Width  3.428  0.3791 2.974     0.3225
     Sepal.Length 5.006  0.3525 6.588     0.6359
    
    > tmp2 <- cbind(tmp, tmp.p)
    > colnames(tmp2) <- c('Setosa Mean','Setosa SD', 'Virginica Mean','Virginica SD',
    + 'P-value')
    > tmp2
                 Setosa Mean Setosa SD Virginica Mean Virginica SD P-value     
    Sepal.Length 0.246       0.1053856 2.026          0.2746501    3.966867e-25
    Sepal.Width  1.462       0.173664  5.552          0.5518947    4.570771e-09
    Petal.Length 3.428       0.3790644 2.974          0.3224966    9.269628e-50
    Petal.Width  5.006       0.3524897 6.588          0.6358796    2.437136e-48
    

    #### Edit ####

    It looks like newer versions of tabular do more checks which makes the cbind approach not work any more (and this could be a good thing, since I am not sure that it was properly matching the values if the ordering was different). I did not find a simple way to still do this using cbind (though you could convert to a matrix, pad the rows for the headers, then cbind).

    Here is another approach that works, it is still a bit of a kludge since it hardcodes the species variable in the function (and the function would therefore have to be updated specifically for each table it is used in):

    library(tables)
    iris2 <- iris[ iris$Species != 'versicolor', ]
    iris2$Species <- factor(iris2$Species)
    P.value <- function(x) t.test(x ~ iris2$Species)$p.value
    tmp <- tabular( Petal.Width+Petal.Length + Sepal.Width+Sepal.Length ~ Species* (mean+sd) + P.value, data=iris2 )
    tmp
    
    0 讨论(0)
提交回复
热议问题