Show correlations as an ordered list, not as a large matrix

后端未结

关注

 5  904

I\'ve a data frame with 100+ columns. cor() returns remarkably quickly, but tells me far too much, especially as most columns are not correlated. I\'d like it to just tell m

相关标签:

5条回答

野的像风

2020-12-02 13:49

I always use

zdf <- as.data.frame(as.table(z))
zdf
#    Var1 Var2     Freq
# 1     a    a  1.00000
# 2     b    a -0.99669
# 3     c    a -0.14063
# 4     d    a -0.28061
# 5     e    a  0.80519

Then use subset(zdf, abs(Freq) > 0.5) to select significant values.

0 讨论(0)

萌比男神i

2020-12-02 13:58

Building off of @Marek's answer. Eliminates diagonal and duplicates

data = as.data.frame( as.table( z ) )
combinations = combn( colnames( z ) , 2 , FUN = function( x ) { paste( x , collapse = "_" ) } )
data = data[ data$Var1 != data$Var2 , ]
data = data[ paste( data$Var1 , data$Var2 , sep = "_" ) %in% combinations , ]

0 讨论(0)

礼貌的吻别

2020-12-02 14:03

Starting from the answer by Marek, I added a few lines for common cleaning using Tidyverse pipes:

  df_cor %>%                               # start from the correlation matrix
  as.table() %>% as.data.frame() %>%       # Marek's answer in TidyVerse format
  subset(Var1 != Var2 & abs(Freq)>0.5) %>% # omit diagonal and keep significant correlations (optional...)
  filter(!duplicated(paste0(pmax(as.character(Var1), as.character(Var2)), pmin(as.character(Var1), as.character(Var2))))) %>%
                                           # keep only unique occurrences, as.character because Var1 and Var2 are factors
  arrange(desc(Freq))                      # sort by Freq

More on line 4 here: How do I select all unique combinations of two columns in an R data frame?

0 讨论(0)

北恋

2020-12-02 14:07

library(reshape)

z[z == 1] <- NA #drop perfect
z[abs(z) < 0.5] <- NA # drop less than abs(0.5)
z <- na.omit(melt(z)) # melt! 
z[order(-abs(z$value)),] # sort

0 讨论(0)

独厮守ぢ

2020-12-02 14:10

There are several ways to visualize correlation matrices so that one can get a quick picture of the data set. Here is a link to an approach which looks pretty good.

0 讨论(0)
发布评论:

提交评论
- 加载中...