Removing outliers easily in R

こ雲淡風輕ζ 提交于 2019-12-05 05:05:06

for me you want something like :

 by(dat,dat$x, function(z) z$y[z$y < 2*sd(z$y)])
dat$x: 3
[1] 4 1 6 5 7 3 2
--------------------------------------------------------------------------------------------------------------- 
dat$x: 8
[1] 4 2 2 2 3
--------------------------------------------------------------------------------------------------------------- 
dat$x: 13
[1] 3 2

EDIT after comment :

 by(dat,dat$x, 
           function(z) z$y[abs(z$y-mean(z$y))< 2*sd(z$y)])

EDIT

I slightly change the by function to get x and y, then I call rbind using do.call

   do.call(rbind,by(dat,dat$x,function(z) {
                              idx <- abs(z$y-mean(z$y))< 2*sd(z$y)
                              z[idx,]
            }))

or using plyr in single call

 ddply(dat,.(x),function(z) {
                 idx <- abs(z$y-mean(z$y))< 2*sd(z$y)
                  z[idx,]})

Something like this?

newdata <- cbind(x,y)[-which(y>2*sd(y)), ]

Or you mean something like this?

Data <- cbind(x,y)
Data[-which(sd(y)>rowMeans(Data)), ]

You can use tapply for this, but you will lose your original ordering.

tapply(y,x,function(z) z[abs(z-mean(z))<2*sd(z)])
$`3`
[1] 4 1 6 5 7 3 2

$`8`
 [1] 5 6 4 2 8 2 7 2 3 5

$`13`
[1] 4 7 6 6 3 2 7
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!