Show correlations as an ordered list, not as a large matrix

后端 未结 5 867
灰色年华
灰色年华 2020-12-02 13:22

I\'ve a data frame with 100+ columns. cor() returns remarkably quickly, but tells me far too much, especially as most columns are not correlated. I\'d like it to just tell m

相关标签:
5条回答
  • 2020-12-02 13:49

    I always use

    zdf <- as.data.frame(as.table(z))
    zdf
    #    Var1 Var2     Freq
    # 1     a    a  1.00000
    # 2     b    a -0.99669
    # 3     c    a -0.14063
    # 4     d    a -0.28061
    # 5     e    a  0.80519
    

    Then use subset(zdf, abs(Freq) > 0.5) to select significant values.

    0 讨论(0)
  • 2020-12-02 13:58

    Building off of @Marek's answer. Eliminates diagonal and duplicates

    data = as.data.frame( as.table( z ) )
    combinations = combn( colnames( z ) , 2 , FUN = function( x ) { paste( x , collapse = "_" ) } )
    data = data[ data$Var1 != data$Var2 , ]
    data = data[ paste( data$Var1 , data$Var2 , sep = "_" ) %in% combinations , ]
    
    0 讨论(0)
  • 2020-12-02 14:03

    Starting from the answer by Marek, I added a few lines for common cleaning using Tidyverse pipes:

      df_cor %>%                               # start from the correlation matrix
      as.table() %>% as.data.frame() %>%       # Marek's answer in TidyVerse format
      subset(Var1 != Var2 & abs(Freq)>0.5) %>% # omit diagonal and keep significant correlations (optional...)
      filter(!duplicated(paste0(pmax(as.character(Var1), as.character(Var2)), pmin(as.character(Var1), as.character(Var2))))) %>%
                                               # keep only unique occurrences, as.character because Var1 and Var2 are factors
      arrange(desc(Freq))                      # sort by Freq
    

    More on line 4 here: How do I select all unique combinations of two columns in an R data frame?

    0 讨论(0)
  • 2020-12-02 14:07
    library(reshape)
    
    z[z == 1] <- NA #drop perfect
    z[abs(z) < 0.5] <- NA # drop less than abs(0.5)
    z <- na.omit(melt(z)) # melt! 
    z[order(-abs(z$value)),] # sort
    
    0 讨论(0)
  • 2020-12-02 14:10

    There are several ways to visualize correlation matrices so that one can get a quick picture of the data set. Here is a link to an approach which looks pretty good.

    0 讨论(0)
提交回复
热议问题