Dealing with missing values for correlations calculation

前端 未结 3 1262
长发绾君心
长发绾君心 2021-02-01 02:37

I have huge matrix with a lot of missing values. I want to get the correlation between variables.

1. Is the solution

cor(na.omit(matr         


        
3条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-01 03:07

    I think the second option makes more sense,

    You might consider using the rcorr function in the Hmisc package.

    It is very fast, and only includes pairwise complete observations. The returned object contains a matrix

    1. of correlation scores
    2. with the number of observation used for each correlation value
    3. of a p-value for each correlation

    This means that you can ignore correlation values based on a small number of observations (whatever that threshold is for you) or based on a the p-value.

    library(Hmisc)
    x<-matrix(nrow=10,ncol=10,data=runif(100))
    x[x>0.5]<-NA
    result<-rcorr(x)
    result$r[result$n<5]<-0 # ignore less than five observations
    result$r
    

提交回复
热议问题