How do I calculate correlations between one column and all other columns in a data frame in R without using column names? I tried to use ddply and it works if I use just two col
As of
packageVersion("dplyr")
[1] ‘1.0.2’
The result of the code suggested in one of the answers returns a tibble
iris %>%
group_by(Species) %>%
do(cormat = cor(select(., -matches("Species"))))
# A tibble: 3 x 2
# Rowwise:
Species cormat
<fct> <list>
1 setosa <dbl[,4] [4 × 4]>
2 versicolor <dbl[,4] [4 × 4]>
3 virginica <dbl[,4] [4 × 4]>
To get the data into a rectangular shape, you can
iris_cor <- iris %>%
group_by(Species) %>%
do(cormat = cor(select(., -matches("Species")))) %>%
pull(cormat) %>% melt
You will have the levels of Species codified on L1
variable.
Var1 Var2 value L1
1 Sepal.Length Sepal.Length 1.0000000 1
2 Sepal.Width Sepal.Length 0.7425467 1
3 Petal.Length Sepal.Length 0.2671758 1
4 Petal.Width Sepal.Length 0.2780984 1
...
I am sure there's a cleaner way of doing this with unnest()
and its friends, but couldn't figure out yet. Hoping this gets noticed
and posts a better solution
You can do this with dplyr using
library(dplyr)
cormat_res <- iris %>%
group_by(Species) %>%
do(cormat = cor(select(., -matches("Species"))))
> cormat_res[[2]]
[[1]]
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 0.7425467 0.2671758 0.2780984
Sepal.Width 0.7425467 1.0000000 0.1777000 0.2327520
Petal.Length 0.2671758 0.1777000 1.0000000 0.3316300
Petal.Width 0.2780984 0.2327520 0.3316300 1.0000000
[[2]]
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 0.5259107 0.7540490 0.5464611
Sepal.Width 0.5259107 1.0000000 0.5605221 0.6639987
Petal.Length 0.7540490 0.5605221 1.0000000 0.7866681
Petal.Width 0.5464611 0.6639987 0.7866681 1.0000000
[[3]]
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 0.4572278 0.8642247 0.2811077
Sepal.Width 0.4572278 1.0000000 0.4010446 0.5377280
Petal.Length 0.8642247 0.4010446 1.0000000 0.3221082
Petal.Width 0.2811077 0.5377280 0.3221082 1.0000000
Maybe like this? It produces a correlation matrix for each species.
by(iris[,1:4], iris$Species, cor)