问题
I'm a beginner with R and can't manage to change outliers for ALL columns in a dataset in R. I succeeded changing one column at a time with
dataset$column[dataset$column %in% boxplot.stats(dataset$column)$out] <- NA
But I have 21 columns on which I need to change the outliers for NA.
How would you do that?
How would you do it for a column range? Specific columns?
回答1:
You can use apply
over the columns. Example:
set.seed(1)
x = matrix(rnorm(20), ncol = 2)
x[2, 1] = 100
x[4, 2] = 200
apply(x, 2, function(row){row[row %in% boxplot(row, plot = FALSE)$out] = NA; row})
[,1] [,2]
[1,] -0.6264538 1.51178117
[2,] NA 0.38984324
[3,] -0.8356286 -0.62124058
[4,] 1.5952808 NA
[5,] 0.3295078 1.12493092
[6,] -0.8204684 -0.04493361
[7,] 0.4874291 -0.01619026
[8,] 0.7383247 0.94383621
[9,] 0.5757814 0.82122120
[10,] -0.3053884 0.59390132
来源:https://stackoverflow.com/questions/23019387/changing-outliers-for-na-in-all-columns-in-a-dataset-in-r