[removed] remove outlier from an array?

前端 未结 5 1600
-上瘾入骨i
-上瘾入骨i 2021-01-02 03:10
values = [8160,8160,6160,22684,0,0,60720,1380,1380,57128]

how can I remove outliers like 0, 57218, 60720 and 22684?

Is there a library whic

相关标签:
5条回答
  • 2021-01-02 03:40

    This method actually fails if the set of your data contains duplicated values. E.g. 1, 2, 2, 2, 2, 2, 3, 10.

    I struggled with it for a while, but then I discovered something called Grubbs'test. So far it seems reliable at least in my case.

    Here's a link to demo (and source): http://xcatliu.com/grubbs/

    0 讨论(0)
  • 2021-01-02 03:45

    This is an improved version of @james-peterson solution that updates the syntax to the current Javascript standard and adds a more robust way of finding the two quartiles (implemented according to formulas at https://de.wikipedia.org/wiki/Interquartilsabstand_(Deskriptive_Statistik) ). It uses a faster way of copying the array (see http://jsben.ch/wQ9RU for a performance comparison) and still works for q1 = q3.

    function filterOutliers(someArray) {
    
      if(someArray.length < 4)
        return someArray;
    
      let values, q1, q3, iqr, maxValue, minValue;
    
      values = someArray.slice().sort( (a, b) => a - b);//copy array fast and sort
    
      if((values.length / 4) % 1 === 0){//find quartiles
        q1 = 1/2 * (values[(values.length / 4)] + values[(values.length / 4) + 1]);
        q3 = 1/2 * (values[(values.length * (3 / 4))] + values[(values.length * (3 / 4)) + 1]);
      } else {
        q1 = values[Math.floor(values.length / 4 + 1)];
        q3 = values[Math.ceil(values.length * (3 / 4) + 1)];
      }
    
      iqr = q3 - q1;
      maxValue = q3 + iqr * 1.5;
      minValue = q1 - iqr * 1.5;
    
      return values.filter((x) => (x >= minValue) && (x <= maxValue));
    }
    

    See this gist: https://gist.github.com/rmeissn/f5b42fb3e1386a46f60304a57b6d215a

    0 讨论(0)
  • 2021-01-02 03:53

    I had some problems with the other two solutions. Problems like having NaN values as q1 and q3 because of wrong indexes. The array length needs to have an -1 because of the 0 index. Then it is checked if the index is a int or decimal, in the case of a decimal the value between two indexes is extracted.

    function filterOutliers (someArray) {
        if (someArray.length < 4) {
            return someArray;
        }
    
        let values = someArray.slice().sort((a, b) => a - b); // copy array fast and sort
    
        let q1 = getQuantile(values, 25);
        let q3 = getQuantile(values, 75);
    
        let iqr, maxValue, minValue;
        iqr = q3 - q1;
        maxValue = q3 + iqr * 1.5;
        minValue = q1 - iqr * 1.5;
    
        return values.filter((x) => (x >= minValue) && (x <= maxValue));
    }
    
    function getQuantile (array, quantile) {
        // Get the index the quantile is at.
        let index = quantile / 100.0 * (array.length - 1);
    
        // Check if it has decimal places.
        if (index % 1 === 0) {
            return array[index];
        } else {
            // Get the lower index.
            let lowerIndex = Math.floor(index);
            // Get the remaining.
            let remainder = index - lowerIndex;
            // Add the remaining to the lowerindex value.
            return array[lowerIndex] + remainder * (array[lowerIndex + 1] - array[lowerIndex]);
        }
    }
    
    0 讨论(0)
  • 2021-01-02 03:58

    This all depends on your interpretation of what an "outlier" is. A common approach:

    • High outliers are anything beyond the 3rd quartile + 1.5 * the inter-quartile range (IQR)
    • Low outliers are anything beneath the 1st quartile - 1.5 * IQR

    This is also the approach described by Wolfram's Mathworld.

    This is easily wrapped up in a function :) I've tried to write the below clearly; obvious refactoring opportunities do exist. Note that your given sample contains no outlying values using this common approach.

    function filterOutliers(someArray) {  
    
        // Copy the values, rather than operating on references to existing values
        var values = someArray.concat();
    
        // Then sort
        values.sort( function(a, b) {
                return a - b;
             });
    
        /* Then find a generous IQR. This is generous because if (values.length / 4) 
         * is not an int, then really you should average the two elements on either 
         * side to find q1.
         */     
        var q1 = values[Math.floor((values.length / 4))];
        // Likewise for q3. 
        var q3 = values[Math.ceil((values.length * (3 / 4)))];
        var iqr = q3 - q1;
    
        // Then find min and max values
        var maxValue = q3 + iqr*1.5;
        var minValue = q1 - iqr*1.5;
    
        // Then filter anything beyond or beneath these values.
        var filteredValues = values.filter(function(x) {
            return (x <= maxValue) && (x >= minValue);
        });
    
        // Then return
        return filteredValues;
    }
    
    0 讨论(0)
  • 2021-01-02 04:04

    Here is the implementation to filter upper outliers from a given collection. This approach follows a similar methodology as the provided answers above.

    The if case will be checking the length of collection if it is 4n or 4n + 1. In that case, we need to get an average of two elements to have our quartiles.

    Otherwise, in cases of 4n + 2 and 4n + 3, we directly can access the upper/lower quartile.

    
    const outlierDetector = collection => {
        const size = collection.length;
    
        let q1, q3;
    
        if (size < 2) {
            return collection;
        }
    
        const sortedCollection = collection.slice().sort((a, b) => a - b);
    
        if ((size - 1) / 4 % 1 === 0 || size / 4 % 1 === 0) {
            q1 = 1 / 2 * (sortedCollection[Math.floor(size / 4) - 1] + sortedCollection[Math.floor(size / 4)]);
            q3 = 1 / 2 * (sortedCollection[Math.ceil(size * 3 / 4) - 1] + sortedCollection[Math.ceil(size * 3 / 4)]);
        } else {
            q1 = sortedCollection[Math.floor(size / 4)];
            q3 = sortedCollection[Math.floor(size * 3 / 4)];
        }
    
        const iqr = q3 - q1;
        const maxValue = q3 + iqr * 1.5;
    
        return sortedCollection.filter(value => value >= maxValue);
    };
    
    
    0 讨论(0)
提交回复
热议问题