问题
I have 2 data frames w/ 5 columns and 100 rows each.
id price1 price2 price3 price4 price5
1 11.22 25.33 66.47 53.76 77.42
2 33.56 33.77 44.77 34.55 57.42
...
I would like to get the correlation of the corresponding rows, basically
for(i in 1:100){
cor(df1[i, 1:5], df2[i, 1:5])
}
but without using a for-loop. I'm assuming there's someway to use plyr
to do it but can't seem to get it right. Any suggestions?
回答1:
Depending on whether you want a cool or fast solution you can use either
diag(cor(t(df1), t(df2)))
which is cool but wasteful (because it actually computes correlations between all rows which you don't really need so they will be discarded) or
A <- as.matrix(df1)
B <- as.matrix(df2)
sapply(seq.int(dim(A)[1]), function(i) cor(A[i,], B[i,]))
which does only what you want but is a bit more to type.
回答2:
I found that as.matrix
is not required.
Correlations of all pairs of rows between dataframes df1
and df2
:
sapply(1:nrow(df1), function(i) cor(df1[i,], df2[i,]))
and columns:
sapply(1:ncol(df1), function(i) cor(df1[,i], df2[,i]))
来源:https://stackoverflow.com/questions/9136116/correlation-between-two-dataframes-by-row