How to detect outliers in an ArrayList

后端 未结 8 929
旧时难觅i
旧时难觅i 2021-01-14 02:54

I\'m trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of \"good values.\"

Example: 100 1

相关标签:
8条回答
  • 2021-01-14 03:33

    As Joni already pointed out , you can eliminate outliers with the help of Standard Deviation and Mean. Here is my code, that you can use for your purposes.

        public static void main(String[] args) {
    
        List<Integer> values = new ArrayList<>();
        values.add(100);
        values.add(105);
        values.add(102);
        values.add(13);
        values.add(104);
        values.add(22);
        values.add(101);
    
        System.out.println("Before: " + values);
        System.out.println("After: " + eliminateOutliers(values,1.5f));
    
    }
    
    protected static double getMean(List<Integer> values) {
        int sum = 0;
        for (int value : values) {
            sum += value;
        }
    
        return (sum / values.size());
    }
    
    public static double getVariance(List<Integer> values) {
        double mean = getMean(values);
        int temp = 0;
    
        for (int a : values) {
            temp += (a - mean) * (a - mean);
        }
    
        return temp / (values.size() - 1);
    }
    
    public static double getStdDev(List<Integer> values) {
        return Math.sqrt(getVariance(values));
    }
    
    public static List<Integer> eliminateOutliers(List<Integer> values, float scaleOfElimination) {
        double mean = getMean(values);
        double stdDev = getStdDev(values);
    
        final List<Integer> newList = new ArrayList<>();
    
        for (int value : values) {
            boolean isLessThanLowerBound = value < mean - stdDev * scaleOfElimination;
            boolean isGreaterThanUpperBound = value > mean + stdDev * scaleOfElimination;
            boolean isOutOfBounds = isLessThanLowerBound || isGreaterThanUpperBound;
    
            if (!isOutOfBounds) {
                newList.add(value);
            }
        }
    
        int countOfOutliers = values.size() - newList.size();
        if (countOfOutliers == 0) {
            return values;
        }
    
        return eliminateOutliers(newList,scaleOfElimination);
    }
    
    • eliminateOutliers() method is doing all the work
    • It is a recursive method, which modifies the list with every recursive call
    • scaleOfElimination variable, which you pass to the method, defines at what scale you want to remove outliers: Normally i go with 1.5f-2f, the greater the variable is, the less outliers will be removed

    The output of the code:

    Before: [100, 105, 102, 13, 104, 22, 101]

    After: [100, 105, 102, 104, 101]

    0 讨论(0)
  • 2021-01-14 03:37
    package test;
    
    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.List;
    
    public class Main {
        public static void main(String[] args) {
            List<Double> data = new ArrayList<Double>();
            data.add((double) 20);
            data.add((double) 65);
            data.add((double) 72);
            data.add((double) 75);
            data.add((double) 77);
            data.add((double) 78);
            data.add((double) 80);
            data.add((double) 81);
            data.add((double) 82);
            data.add((double) 83);
            Collections.sort(data);
            System.out.println(getOutliers(data));
        }
    
        public static List<Double> getOutliers(List<Double> input) {
            List<Double> output = new ArrayList<Double>();
            List<Double> data1 = new ArrayList<Double>();
            List<Double> data2 = new ArrayList<Double>();
            if (input.size() % 2 == 0) {
                data1 = input.subList(0, input.size() / 2);
                data2 = input.subList(input.size() / 2, input.size());
            } else {
                data1 = input.subList(0, input.size() / 2);
                data2 = input.subList(input.size() / 2 + 1, input.size());
            }
            double q1 = getMedian(data1);
            double q3 = getMedian(data2);
            double iqr = q3 - q1;
            double lowerFence = q1 - 1.5 * iqr;
            double upperFence = q3 + 1.5 * iqr;
            for (int i = 0; i < input.size(); i++) {
                if (input.get(i) < lowerFence || input.get(i) > upperFence)
                    output.add(input.get(i));
            }
            return output;
        }
    
        private static double getMedian(List<Double> data) {
            if (data.size() % 2 == 0)
                return (data.get(data.size() / 2) + data.get(data.size() / 2 - 1)) / 2;
            else
                return data.get(data.size() / 2);
        }
    }
    

    Output: [20.0]

    Explanation:

    • Sort a list of integers, from low to high
    • Split a list of integers into 2 parts (by a middle) and put them into 2 new separate ArrayLists (call them "left" and "right")
    • Find a middle number (median) in both of those new ArrayLists
    • Q1 is a median from left side, and Q3 is the median from the right side
    • Applying mathematical formula:
    • IQR = Q3 - Q1
    • LowerFence = Q1 - 1.5*IQR
    • UpperFence = Q3 + 1.5*IQR
    • More info about this formula: http://www.mathwords.com/o/outlier.htm
    • Loop through all of my original elements, and if any of them are lower than a lower fence, or higher than an upper fence, add them to "output" ArrayList
    • This new "output" ArrayList contains the outliers
    0 讨论(0)
  • 2021-01-14 03:45
    • find the mean value for your list
    • create a Map that maps the number to the distance from mean
    • sort values by the distance from mean
    • and differentiate last n number, making sure there is no injustice with distance
    0 讨论(0)
  • 2021-01-14 03:46

    There are several criteria for detecting outliers. The simplest ones, like Chauvenet's criterion, use the mean and standard deviation calculated from the sample to determine a "normal" range for values. Any value outside of this range is deemed an outlier.

    Other criterions are Grubb's test and Dixon's Q test and may give better results than Chauvenet's for example if the sample comes from a skew distribution.

    0 讨论(0)
  • 2021-01-14 03:48

    I'm very glad and thanks to Valiyev. His solution helped me a lot. And I want to shere my little SRP on his works.

    Please note that I use List.of() to store Dixon's critical values, for this reason it is required to use Java higher than 8.

    public class DixonTest {
    protected List<Double> criticalValues = 
        List.of(0.941, 0.765, 0.642, 0.56, 0.507, 0.468, 0.437);
    private double scaleOfElimination;
    private double mean;
    private double stdDev;
    
    private double getMean(final List<Double> input) {
        double sum = input.stream()
                .mapToDouble(value -> value)
                .sum();
        return (sum / input.size());
    }
    
      private double getVariance(List<Double> input) {
        double mean = getMean(input);
        double temp = input.stream()
                .mapToDouble(a -> a)
                .map(a -> (a - mean) * (a - mean))
                .sum();
        return temp / (input.size() - 1);
    }
    
    private double getStdDev(List<Double> input) {
        return Math.sqrt(getVariance(input));
    }
    
    protected List<Double> eliminateOutliers(List<Double> input) {
        int N = input.size() - 3;
        scaleOfElimination = criticalValues.get(N).floatValue();
        mean = getMean(input);
        stdDev = getStdDev(input);
    
        return input.stream()
                .filter(this::isOutOfBounds)
                .collect(Collectors.toList());
    }
    
    private boolean isOutOfBounds(Double value) {
        return !(isLessThanLowerBound(value)
                || isGreaterThanUpperBound(value));
    }
    
    private boolean isGreaterThanUpperBound(Double value) {
        return value > mean + stdDev * scaleOfElimination;
    }
    
    private boolean isLessThanLowerBound(Double value) {
        return value < mean - stdDev * scaleOfElimination;
    }
    }
    

    I hope it will help someone else.

    Best regard

    0 讨论(0)
  • 2021-01-14 03:50

    Use this algorithm. This algorithm uses the average and standard deviation. These 2 number optional values (2 * standardDeviation).

     public static List<int> StatisticalOutLierAnalysis(List<int> allNumbers)
                {
                    if (allNumbers.Count == 0)
                        return null;
    
                    List<int> normalNumbers = new List<int>();
                    List<int> outLierNumbers = new List<int>();
                    double avg = allNumbers.Average();
                    double standardDeviation = Math.Sqrt(allNumbers.Average(v => Math.Pow(v - avg, 2)));
                    foreach (int number in allNumbers)
                    {
                        if ((Math.Abs(number - avg)) > (2 * standardDeviation))
                            outLierNumbers.Add(number);
                        else
                            normalNumbers.Add(number);
                    }
    
                    return normalNumbers;
                }
    
    0 讨论(0)
提交回复
热议问题