Assume we have two numeric vectors x
and y
. The Pearson correlation coefficient between x
and y
is given by
Using method = "spearman"
in cor
will be robust to contamination and is easy to implement since it only involves replacing cor(x, y)
with cor(x, y, method = "spearman")
.
Repeating Prasad's analysis but using Spearman correlations instead we find that the Spearman correlation is indeed robust to the contamination here, recovering the underlying zero correlation:
set.seed(1)
# x and y are uncorrelated
x <- rnorm(1000)
y <- rnorm(1000)
cor(x,y)
## [1] 0.006401211
# add contamination -- now cor says they are highly correlated
x <- c(x, 500)
y <- c(y, 500)
cor(x, y)
## [1] 0.995741
# but with method = "spearman" contamination is removed & they are shown to be uncorrelated
cor(x, y, method = "spearman")
## [1] -0.007270813