问题
I am trying to remove outliers from my data. The outliers in my case are the values that are away from rest of the data when plotted on a boxplot. After removing outliers, I will save data in new file and run some prediction model to see the results. How different they are from the original data.
I used one tutorial and adopted it to remove outliers from my data. The tutorial uses boxplotting to figure out the outliers.
It works fine when I run it on a column that has outliers. But it raises errors when I run it for a column that don't have outliers. How to remove this error?
Here is code:
outlier_rem <- Data_combined #data-frame with 25 var, few have outliers
#removing outliers from the column
outliers <- boxplot(outlier_rem$var1, plot=FALSE)$out
#print(outliers)
ol <- outlier_rem[-which(outlier_rem$var1 %in% outliers),]
dim(ol)
# [1] 0 25
boxplot(ol)
Produces the error:
no non-missing arguments to min; returning Infno non-missing arguments to max;
returning -InfError in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) :
need finite 'ylim' values
回答1:
The following works
# Sample data based on mtcars and one additional row
df <- rbind(mtcars[, 1:3], c(100, 6, 300))
# Identify outliers
outliers <- boxplot(df$mpg, plot = FALSE)$out
#[1] 33.9 100.0
# Remove outliers
df[!(df$mpg %in% outliers), ]
The reason why your method fails is because if there are no outliers
, which(mtcars$mpg %in% numeric(0))
returns integer(0)
and you end up with a zero-row data.frame
, which is exactly what you see from dim
.
outliers <- boxplot(mtcars$mpg, plot = FALSE)$out
outliers
#numeric(0)
Compare
which(mtcars$mpg %in% outliers)
#integer(0)
with
df$mpg %in% outliers
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
There exists a nice post here on SO that elaborates on this point.
来源:https://stackoverflow.com/questions/54782522/remove-outliers-from-data-frame-in-r