I have a python script that creates a list of lists of server uptime and performance data, where each sub-list (or \'row\') contains a particular cluster\'s stats. For example,
One good way of identifying outliers visually is to make a boxplot (or box-and-whiskers plot), which will show the median, and a couple of quartiles above and below the median, and the points that lie "far" from this box (see Wikipedia entry http://en.wikipedia.org/wiki/Box_plot). In R, there's a boxplot
function to do just that.
One way to discard/identify outliers programmatically is to use the MAD, or Median Absolute Deviation. The MAD is not sensitive to outliers, unlike the standard deviation. I sometimes use a rule of thumb to consider all points that are more than 5*MAD away from the median, to be outliers.