spearman correlation by group in R

前端 未结 4 707
心在旅途
心在旅途 2020-11-30 04:57

How do you calculate Spearman correlation by group in R. I found the following link talking about Pearson correlation by group. But when I tried to replace the type with s

相关标签:
4条回答
  • 2020-11-30 05:42

    How about this for a base R solution:

    df <- data.frame(group = rep(c("G1", "G2"), each = 10),
                     var1 = rnorm(20),
                     var2 = rnorm(20))
    
    r <- by(df, df$group, FUN = function(X) cor(X$var1, X$var2, method = "spearman"))
    # df$group: G1
    # [1] 0.4060606
    # ------------------------------------------------------------ 
    # df$group: G2
    # [1] 0.1272727
    

    And then, if you want the results in the form of a data.frame:

    data.frame(group = dimnames(r)[[1]], corr = as.vector(r))
    #   group      corr
    # 1    G1 0.4060606
    # 2    G2 0.1272727
    

    EDIT: If you prefer a plyr-based solution, here is one:

    library(plyr)
    ddply(df, .(group), summarise, "corr" = cor(var1, var2, method = "spearman"))
    
    0 讨论(0)
  • 2020-11-30 05:44

    very old question, but this tidy & broom solution is extremely straightforward. Thus I have to share the approach:

    set.seed(123)
    df <- data.frame(group = rep(c("G1", "G2"), each = 10),
                     var1 = rnorm(20),
                     var2 = rnorm(20))
    
    library(tidyverse)
    library(broom)
    
    df  %>% 
      group_by(group) %>%
      summarize(correlation = cor(var1, var2,, method = "sp"))
    # A tibble: 2 x 2
      group correlation
      <fct>       <dbl>
    1 G1        -0.200 
    2 G2         0.0545
    
    # with pvalues and further stats
    df %>% 
      nest(-group) %>% 
      mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>%
      mutate(tidied = map(cor, tidy)) %>% 
      unnest(tidied, .drop = T)
    # A tibble: 2 x 6
      group estimate statistic p.value method                          alternative
      <fct>    <dbl>     <dbl>   <dbl> <chr>                           <chr>      
    1 G1     -0.200        198   0.584 Spearman's rank correlation rho two.sided  
    2 G2      0.0545       156   0.892 Spearman's rank correlation rho two.sided 
    

    Since some time/dplyr version, you need to write this to get results like above and no errors:

    df %>% 
      nest(data = -group) %>%
      mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>%
      mutate(tidied = map(cor, tidy)) %>% 
      unnest(tidied) %>% 
      select(-data, -cor)
    
    0 讨论(0)
  • 2020-11-30 05:51

    Here's another way to do it:

    # split the data by group then apply spearman correlation
    # to each element of that list
    j <- lapply(split(df, df$group), function(x){cor(x[,2], x[,3], method = "spearman")})
    
    # Bring it together
    data.frame(group = names(j), corr = unlist(j), row.names = NULL)
    

    Comparing my method, Josh's method, and the plyr solution using rbenchmark:

    Dason <- function(){
        # split the data by group then apply spearman correlation
        # to each element of that list
        j <- lapply(split(df, df$group), function(x){cor(x[,2], x[,3], method = "spearman")})
    
        # Bring it together
        data.frame(group = names(j), corr = unlist(j), row.names = NULL)
    }
    
    Josh <- function(){
        r <- by(df, df$group, FUN = function(X) cor(X$var1, X$var2, method = "spearman"))
        data.frame(group = attributes(r)$dimnames[[1]], corr = as.vector(r))
    }
    
    plyr <- function(){
        ddply(df, .(group), summarise, "corr" = cor(var1, var2, method = "spearman"))
    }
    
    
    library(rbenchmark)
    benchmark(Dason(), Josh(), plyr())
    

    Which gives the output

    > benchmark(Dason(), Josh(), plyr())
         test replications elapsed relative user.self sys.self user.child sys.child
    1 Dason()          100    0.19 1.000000      0.19        0         NA        NA
    2  Josh()          100    0.24 1.263158      0.22        0         NA        NA
    3  plyr()          100    0.51 2.684211      0.52        0         NA        NA
    

    So it appears my method is slightly faster but not by much. I think Josh's method is a little more intuitive. The plyr solution is the easiest to code up but it's not the fastest (but it sure is a lot more convenient)!

    0 讨论(0)
  • 2020-11-30 05:57

    If you want an efficient solution for large numbers of groups then data.table is the way to go.

    library(data.table)
    DT <- as.data.table(df)
    setkey(DT, group)
    DT[,list(corr = cor(var1,var2,method = 'spearman')), by = group]
    
    0 讨论(0)
提交回复
热议问题