I have a python script that creates a list of lists of server uptime and performance data, where each sub-list (or \'row\') contains a particular cluster\'s stats. For example,
Your stated goal of "finding badness" implies that it is not the outliers that you are looking for, but observations that fall above or below some threshold, and I would presume that the threshold would remain the same over time.
As an example, if all of your servers were at 98 ± 0.1 % availability, a server at 100% availability would be an outlier, as would a server at 97.6% availability. But these may be within your desired limits.
On the other hand, there may be good reasons apriori to want to be notified of any server at less than 95% availability, whether or not there is one or many servers below this threshold.
For this reason, a search for outliers may not provide the information that you are interested in. The thresholds could be determined statistically based on historical data, e.g. by modeling error rate as poisson or percent availability as beta variables. In an applied setting, these thresholds could probably be determined based on performance requirements.