发表新帖

发表新帖

Remove outliers from correlation coefficient calculation

前端未结

关注

 5  1936

有刺的猬 2021-01-31 22:57

Assume we have two numeric vectors x and y. The Pearson correlation coefficient between x and y is given by

5条回答

伪装坚强ぢ (楼主)

2021-01-31 23:27
Using method = "spearman" in cor will be robust to contamination and is easy to implement since it only involves replacing cor(x, y) with cor(x, y, method = "spearman").

Repeating Prasad's analysis but using Spearman correlations instead we find that the Spearman correlation is indeed robust to the contamination here, recovering the underlying zero correlation:
```
set.seed(1)

# x and y are uncorrelated
x <- rnorm(1000)
y <- rnorm(1000)
cor(x,y)
## [1] 0.006401211

# add contamination -- now cor says they are highly correlated
x <- c(x, 500)
y <- c(y, 500)
cor(x, y)
## [1] 0.995741

# but with method = "spearman" contamination is removed & they are shown to be uncorrelated
cor(x, y, method = "spearman")
## [1] -0.007270813
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题