Finding outliers in a data set

前端未结

关注

 4  1404

说谎 2021-02-07 07:38

I have a python script that creates a list of lists of server uptime and performance data, where each sub-list (or \'row\') contains a particular cluster\'s stats. For example,

4条回答

离开以前 (楼主)

2021-02-07 07:48
You need to calculate the Mean (Average) and Standard Deviation for the column. Stadard deviation is a bit confusing, but the important fact is that 2/3 of the data is within

Mean +/- StandardDeviation

Generally anything outside Mean +/- 2 * StandardDeviation is an outlier, but you can tweak the multiplier.

http://en.wikipedia.org/wiki/Standard_deviation

So to be clear, you want to convert the data to standard deviations from the mean.

ie
```
def getdeviations(x, mean, stddev):
   return math.abs(x - mean) / stddev
```
Numpy has functions for this.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...