Removing outliers easily in R
问题 I have data with discrete x-values, such as x = c(3,8,13,8,13,3,3,8,13,8,3,8,8,13,8,13,8,3,3,8,13,8,13,3,3) y = c(4,5,4,6,7,20,1,4,6,2,6,8,2,6,7,3,2,5,7,3,2,5,7,3,2); How can I generate a new dataset of x and y values where I eliminate pairs of values where the y-value is 2 standard deviations above the mean for that bin. For example, in the x=3 bin, 20 is more than 2 SDs above the mean, so that data point should be removed. 回答1: for me you want something like : by(dat,dat$x, function(z) z$y