Remove outliers from correlation coefficient calculation

前端 未结 5 1936
有刺的猬
有刺的猬 2021-01-31 22:57

Assume we have two numeric vectors x and y. The Pearson correlation coefficient between x and y is given by

5条回答
  •  伪装坚强ぢ
    2021-01-31 23:27

    Using method = "spearman" in cor will be robust to contamination and is easy to implement since it only involves replacing cor(x, y) with cor(x, y, method = "spearman").

    Repeating Prasad's analysis but using Spearman correlations instead we find that the Spearman correlation is indeed robust to the contamination here, recovering the underlying zero correlation:

    set.seed(1)
    
    # x and y are uncorrelated
    x <- rnorm(1000)
    y <- rnorm(1000)
    cor(x,y)
    ## [1] 0.006401211
    
    # add contamination -- now cor says they are highly correlated
    x <- c(x, 500)
    y <- c(y, 500)
    cor(x, y)
    ## [1] 0.995741
    
    # but with method = "spearman" contamination is removed & they are shown to be uncorrelated
    cor(x, y, method = "spearman")
    ## [1] -0.007270813
    

提交回复
热议问题