问题
I have a dataframe and I want to get each columns of outliers indices.
Here is part of my dataframe;
mediamarkt[,48]
[1] 7126 4012 3711 3237 3432 2671 2861 7065 3158 4023 4770 3861
[13] 4108 7408 9071 3596 3889 4093 4446 6059 8345 10291 5546 5129
[25] 4683 4670 5694 8619 11047 5743 5775 5216 5283 4854 7871 9944
[37] 3797 3821 3834 3999 4577 8898 11396 4508 5459 3668 3885 4021
[49] 7491 8831 3513 3606 3332 3189 3656 6859 9167 3306 3305 3379
[61] 3507 3912 6562 8245 3420 3445 3530 3404 3847 7187 9128 3623
[73] 3581 3401 2784 3024 6342 7835 2766 2718 2578 2591 2737 5479
[85] 7064 2528 2550 2287 1893 1846
First of all I have tried to get value of outliers with this codes:
boxplot(mediamarkt[,48])$out
and I get 2 outliers;
[1] 11047 11396
Everything is okey so far but when I need to get indices of outliers with these code below:
which(mediamarkt[,48] %in% boxplot_mediamarkt$out)
[1] 5 18 29 43 59
I get more than 2 outliers, it does not match these results
What is wrong with my codes
Could anyone help me about solve my problem?
回答1:
@G5W has asked a question that remains open. This code shows how to do easy input for your data and suggests that your boxplot_mediamarkt
is not the output of boxplot
or boxplot.stats
from your data.
dat <- scan()
1: 7126 4012 3711 3237 3432 2671 2861 7065 3158 4023 4770 3861
13: 4108 7408 9071 3596 3889 4093 4446 6059 8345 10291 5546 5129
25: 4683 4670 5694 8619 11047 5743 5775 5216 5283 4854 7871 9944
37: 3797 3821 3834 3999 4577 8898 11396 4508 5459 3668 3885 4021
49: 7491 8831 3513 3606 3332 3189 3656 6859 9167 3306 3305 3379
61: 3507 3912 6562 8245 3420 3445 3530 3404 3847 7187 9128 3623
73: 3581 3401 2784 3024 6342 7835 2766 2718 2578 2591 2737 5479
85: 7064 2528 2550 2287 1893 1846
91:
Read 90 items
> boxplot(dat)$out
[1] 11047 11396
> which(dat %in% boxplot(dat)$out)
[1] 29 43
来源:https://stackoverflow.com/questions/44392527/how-to-get-indices-of-outliers-in-a-dataframe-boxplot