问题
I am using the lmrob
function in R using the robustbase
library for robust regression. I would use it as, rob_reg<-lmrob(y~0+.,dat,method="MM",control=a1)
. When i want to return the summary i use summary(rob_reg)
and one thing robust regression do is identifying outliers in the data. A certain part of the summary output give me the following,
6508 observations c(49,55,58,77,104,105,106,107,128,134,147,153,...)
are outliers with |weight| <= 1.4e-06 ( < 1.6e-06);
which list all the outliers, in this case 6508 (i removed the majority and replaced it by ...). I need to somehow get these these outliers and remove them from my data. What i did before was to use summary(rob_reg)$rweights
to get all the weights for the observations and remove those observations with a weight less than say a certain value in the example above the value would be 1.6e-06
. I would like to know, is there a way to get a list of only the outliers without first getting the weights of all the observations?
回答1:
This is an old post but I recently had a need for this so I thought I'd share my solution.
#fit the model
fit = lmrob(y ~ x, data)
#create a model summary
fit.summary = summary(fit)
#extract the outlier threshold weight from the summary
out.thresh = fit.summary$control$eps.outlier
#returns the weights corresponding to the outliers
#names(out.liers) corresponds to the index of the observation
out.liers = fit.summary$rweights[which(fit.summary$rweights <= out.thresh)]
#add a True/False variable for outlier to the original data by matching row.names of the original data to names of the list of outliers
data$outlier = rep(NA, nrow(data))
for(i in 1:nrow(data)){
data$outlier[i] = ifelse(row.names(data[i] %in% names(out.liers), "True", "False")
}
来源:https://stackoverflow.com/questions/24460061/outliers-with-robust-regression-in-r