How to get median and quartiles/percentiles of an array in JavaScript (or PHP)?

前端 未结 3 1180
我寻月下人不归
我寻月下人不归 2021-02-07 06:24

This question is turned into a Q&A, because I had struggle finding the answer, and think it can be useful for others

I have a JavaScript

3条回答
  •  一整个雨季
    2021-02-07 06:46

    TL;DR

    The other answers appear to have solid implementations of the "R-7" version of computing quantiles. Below is some context and another JavaScript implementation borrowed from D3 using the same R-7 method, with the bonus that this one probably covers a few more edge cases.


    Background

    After a little sleuthing on some math and stats StackExchange sites (1, 2), I found that there are "common sensical" ways of calculating each quantile, but that those don't typically mesh up with the results of the nine generally recognized ways to calculate them.

    The answer at that second link from stats.stackexchange says of the common-sensical method that...

    Your textbook is confused. Very few people or software define quartiles this way. (It tends to make the first quartile too small and the third quartile too large.)

    The quantile function in R implements nine different ways to compute quantiles!

    I thought that last bit was interesting, and here's what I dug up on those nine methods...

    • Wikipedia's description of those nine methods here, nicely grouped in a table
    • An article from the Journal of Statistics Education titled "Quartiles in Elementary Statistics"
    • A blog post at SAS.com called "Sample quantiles: A comparison of 9 definitions"

    That tells me I probably shouldn't try to code something based on my understanding of what quartiles represent and should borrow someone else's solution.


    Existing solution from D3

    One example is from D3. Its d3.array package has a quantile function that's essentially BSD licensed:

    https://github.com/d3/d3-array/blob/master/src/quantile.js

    I've quickly created a pretty straight port of d3's version that requires the array of elements to have already been sorted into vanilla JavaScript. Here it is. I've tested it a bit against d3's results itself enough to feel it's a valid port, but your experience might differ (let me know in the comments if it does, though!):

      //Credit D3: https://github.com/d3/d3-array/blob/master/LICENSE
      function quantileSorted(values, p, fnValueFrom) {
        var n = values.length;
        if (!n) {
          return;
        }
    
        fnValueFrom =
          Object.prototype.toString.call(fnValueFrom) == "[object Function]"
            ? fnValueFrom
            : function (x) {
                return x;
              };
    
        p = +p;
    
        if (p <= 0 || n < 2) {
          return +fnValueFrom(values[0], 0, values);
        }
    
        if (p >= 1) {
          return +fnValueFrom(values[n - 1], n - 1, values);
        }
    
        var i = (n - 1) * p,
          i0 = Math.floor(i),
          value0 = +fnValueFrom(values[i0], i0, values),
          value1 = +fnValueFrom(values[i0 + 1], i0 + 1, values);
    
        return value0 + (value1 - value0) * (i - i0);
      }
    

    Note that fnValueFrom is a way to process a complex object into a value. You can see how that might work in a list of d3 usage examples here -- search down where .quantile is used.

    The quick version is if the values are tortoises and you're sorting tortoise.age in every case, your fnValueFrom might be x => x.age. More complicated versions, including ones that might require accessing the index (parameter 2) and entire collection (parameter 3) during the value calcuation, are left up to the reader.

    I've added a quick check here so that if nothing is given for fnValueFrom or if what's given isn't a function the logic assumes the elements in values are the actual sorted values themselves.


    Logical comparison to existing answers

    I'm reasonably sure this reduces to the same version in the other two answers (see below), but if you needed to justify why you're using this to a product manager or whatever maybe the above will help.

    Quick comparison:

    function Quartile(data, q) {
      data=Array_Sort_Numbers(data);        // we're assuming it's already sorted, above, vs. the function use here. same difference.
      var pos = ((data.length) - 1) * q;    // i = (n - 1) * p
      var base = Math.floor(pos);           // i0 = Math.floor(i)
      var rest = pos - base;                // (i - i0);
      if( (data[base+1]!==undefined) ) {
        //      value0    + (i - i0)   * (value1 which is values[i0+1] - value0 which is values[i0])
        return data[base] + rest       * (data[base+1]                 - data[base]);
      } else {
        // I think this is covered by if (p <= 0 || n < 2)
        return data[base];
      }
    }
    

    So that's logically close/appears to be exactly the same. I think d3's version that I ported covers a few more edge/invalid conditions, which could be useful.


    By the way, the answers here, according to d3-array's readme, all use the "R-7 method":

    This particular implementation uses the R-7 method, which is the default for the R programming language and Excel.

    For a little further reading, the differences between d3's use of R-7 to determine quantiles and the common sensical approach is demonstrated nicely in this question and described a bit in a post that's linked to from philippe's original source for the php version over here (in German). Here's a bit from Google Translate:

    In our example, this value is at the (n + 1) / 4 digit = 5.25, i.e. between the 5th value (= 5) and the 6th value (= 7). The fraction (0.25) indicates that in addition to the value of 5, ¼ of the distance between 5 and 6 is added. Q1 is therefore 5 + 0.25 * 2 = 5.5.

提交回复
热议问题