Calculate correlation by aggregating columns of data frame

前端 未结 3 388
时光说笑
时光说笑 2021-01-15 11:19

I have the following data frame:

y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) )

How to

相关标签:
3条回答
  • 2021-01-15 12:07

    You're almost there: you just need to use apply instead of sapply, and remove unnecessary columns.

    apply(y[-1], 1, function(x) cor(x[1:2], x[3:4])
    

    Of course, the correlation between two length-2 vectors isn't very informative....

    0 讨论(0)
  • 2021-01-15 12:09

    You can use apply to apply a function to each row (or column) of a matrix, array or data.frame.

    apply(
      y[,-1], # Remove the first column, to ensure that u remains numeric
      1,      # Apply the function on each row
      function(u) cor( u[1:2], u[3:4] )
    )
    

    (With just 2 observations, the correlation can only be +1 or -1.)

    0 讨论(0)
  • 2021-01-15 12:14

    You could use apply

    > apply(y[,-1],1,function(x) cor(x[1:2],x[3:4]))
    [1] -1 -1  1 -1 1
    

    Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows):

    > ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d)))
      group V1
    1     a -1
    2     b -1
    3     c  1
    4     d -1
    5     e  1
    
    0 讨论(0)
提交回复
热议问题