Finding outliers in a data set

前端未结

关注

 4  1403

说谎 2021-02-07 07:38

I have a python script that creates a list of lists of server uptime and performance data, where each sub-list (or \'row\') contains a particular cluster\'s stats. For example,

4条回答

情书的邮戳 (楼主)

2021-02-07 07:51

One good way of identifying outliers visually is to make a boxplot (or box-and-whiskers plot), which will show the median, and a couple of quartiles above and below the median, and the points that lie "far" from this box (see Wikipedia entry http://en.wikipedia.org/wiki/Box_plot). In R, there's a boxplot function to do just that.

One way to discard/identify outliers programmatically is to use the MAD, or Median Absolute Deviation. The MAD is not sensitive to outliers, unlike the standard deviation. I sometimes use a rule of thumb to consider all points that are more than 5*MAD away from the median, to be outliers.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...