correlation between columns by group

后端 未结 3 1933
暗喜
暗喜 2021-01-27 17:06

How do I calculate correlations between one column and all other columns in a data frame in R without using column names? I tried to use ddply and it works if I use just two col

3条回答
  •  情话喂你
    2021-01-27 17:18

    As of

    packageVersion("dplyr")
    [1] ‘1.0.2’
    

    The result of the code suggested in one of the answers returns a tibble

    iris %>%
         group_by(Species) %>%
         do(cormat = cor(select(., -matches("Species"))))
    # A tibble: 3 x 2
    # Rowwise: 
      Species    cormat           
                       
    1 setosa     
    2 versicolor 
    3 virginica  
    

    To get the data into a rectangular shape, you can

    iris_cor <- iris %>%
         group_by(Species) %>%
         do(cormat = cor(select(., -matches("Species")))) %>%
         pull(cormat) %>% melt
    

    You will have the levels of Species codified on L1 variable.

               Var1         Var2     value L1
    1  Sepal.Length Sepal.Length 1.0000000  1
    2   Sepal.Width Sepal.Length 0.7425467  1
    3  Petal.Length Sepal.Length 0.2671758  1
    4   Petal.Width Sepal.Length 0.2780984  1
    ...
    

    I am sure there's a cleaner way of doing this with unnest() and its friends, but couldn't figure out yet. Hoping this gets noticed and posts a better solution

提交回复
热议问题