I\'ve a data frame with 100+ columns. cor() returns remarkably quickly, but tells me far too much, especially as most columns are not correlated. I\'d like it to just tell m
I always use
zdf <- as.data.frame(as.table(z))
zdf
# Var1 Var2 Freq
# 1 a a 1.00000
# 2 b a -0.99669
# 3 c a -0.14063
# 4 d a -0.28061
# 5 e a 0.80519
Then use subset(zdf, abs(Freq) > 0.5)
to select significant values.
Building off of @Marek's answer. Eliminates diagonal and duplicates
data = as.data.frame( as.table( z ) )
combinations = combn( colnames( z ) , 2 , FUN = function( x ) { paste( x , collapse = "_" ) } )
data = data[ data$Var1 != data$Var2 , ]
data = data[ paste( data$Var1 , data$Var2 , sep = "_" ) %in% combinations , ]
Starting from the answer by Marek, I added a few lines for common cleaning using Tidyverse pipes:
df_cor %>% # start from the correlation matrix
as.table() %>% as.data.frame() %>% # Marek's answer in TidyVerse format
subset(Var1 != Var2 & abs(Freq)>0.5) %>% # omit diagonal and keep significant correlations (optional...)
filter(!duplicated(paste0(pmax(as.character(Var1), as.character(Var2)), pmin(as.character(Var1), as.character(Var2))))) %>%
# keep only unique occurrences, as.character because Var1 and Var2 are factors
arrange(desc(Freq)) # sort by Freq
More on line 4 here: How do I select all unique combinations of two columns in an R data frame?
library(reshape)
z[z == 1] <- NA #drop perfect
z[abs(z) < 0.5] <- NA # drop less than abs(0.5)
z <- na.omit(melt(z)) # melt!
z[order(-abs(z$value)),] # sort
There are several ways to visualize correlation matrices so that one can get a quick picture of the data set. Here is a link to an approach which looks pretty good.