Finding outliers in a data set

前端 未结 4 1404
说谎
说谎 2021-02-07 07:38

I have a python script that creates a list of lists of server uptime and performance data, where each sub-list (or \'row\') contains a particular cluster\'s stats. For example,

4条回答
  •  离开以前
    2021-02-07 07:48

    You need to calculate the Mean (Average) and Standard Deviation for the column. Stadard deviation is a bit confusing, but the important fact is that 2/3 of the data is within

    Mean +/- StandardDeviation

    Generally anything outside Mean +/- 2 * StandardDeviation is an outlier, but you can tweak the multiplier.

    http://en.wikipedia.org/wiki/Standard_deviation

    So to be clear, you want to convert the data to standard deviations from the mean.

    ie

    def getdeviations(x, mean, stddev):
       return math.abs(x - mean) / stddev
    

    Numpy has functions for this.

提交回复
热议问题