Is there a better way to calculate the median (not average)

前端 未结 7 824
我在风中等你
我在风中等你 2021-02-02 15:38

Suppose I have the following table definition:

CREATE TABLE x (i serial primary key, value integer not null);

I want to calculate the MEDIAN o

相关标签:
7条回答
  • 2021-02-02 15:58

    Simple sql with native postgres functions only:

    select 
        case count(*)%2
            when 1 then (array_agg(num order by num))[count(*)/2+1]
            else ((array_agg(num order by num))[count(*)/2]::double precision + (array_agg(num order by num))[count(*)/2+1])/2
        end as median
    from unnest(array[5,17,83,27,28]) num;
    

    Sure you can add coalesce() or something if you want to handle nulls.

    0 讨论(0)
  • 2021-02-02 15:59
    CREATE TABLE array_table (id integer, values integer[]) ;
    
    INSERT INTO array_table VALUES ( 1,'{1,2,3}');
    INSERT INTO array_table VALUES ( 2,'{4,5,6,7}');
    
    select id, values, cardinality(values) as array_length,
    (case when cardinality(values)%2=0 and cardinality(values)>1 then (values[(cardinality(values)/2)]+ values[((cardinality(values)/2)+1)])/2::float 
     else values[(cardinality(values)+1)/2]::float end) as median  
     from array_table
    

    Or you can create a function and use it any where in your further queries.

    CREATE OR REPLACE FUNCTION median (a integer[]) 
    RETURNS float AS    $median$ 
    Declare     
        abc float; 
    BEGIN    
        SELECT (case when cardinality(a)%2=0 and cardinality(a)>1 then 
               (a[(cardinality(a)/2)] + a[((cardinality(a)/2)+1)])/2::float   
               else a[(cardinality(a)+1)/2]::float end) into abc;    
        RETURN abc; 
    END;    
    $median$ 
    LANGUAGE plpgsql;
    
    select id,values,median(values) from array_table
    
    0 讨论(0)
  • 2021-02-02 16:07

    Use the Below function for Finding nth percentile

    CREATE or REPLACE FUNCTION nth_percentil(anyarray, int)
        RETURNS 
            anyelement as 
        $$
            SELECT $1[$2/100.0 * array_upper($1,1) + 1] ;
        $$ 
    LANGUAGE SQL IMMUTABLE STRICT;
    

    In Your case it's 50th Percentile.

    Use the Below Query to get the Median

    SELECT nth_percentil(ARRAY (SELECT Field_name FROM table_name ORDER BY 1),50)
    

    This will give you 50th percentile which is the median basically.

    Hope this is helpful.

    0 讨论(0)
  • 2021-02-02 16:12

    Indeed there IS an easier way. In Postgres you can define your own aggregate functions. I posted functions to do median as well as mode and range to the PostgreSQL snippets library a while back.

    http://wiki.postgresql.org/wiki/Aggregate_Median

    0 讨论(0)
  • 2021-02-02 16:20

    A simpler query for that:

    WITH y AS (
       SELECT value, row_number() OVER (ORDER BY value) AS rn
       FROM   x
       WHERE  value IS NOT NULL
       )
    , c AS (SELECT count(*) AS ct FROM y) 
    SELECT CASE WHEN c.ct%2 = 0 THEN
              round((SELECT avg(value) FROM y WHERE y.rn IN (c.ct/2, c.ct/2+1)), 3)
           ELSE
                    (SELECT     value  FROM y WHERE y.rn = (c.ct+1)/2)
           END AS median
    FROM   c;
    

    Major points

    • Ignores NULL values.
    • Core feature is the row_number() window function, which has been there since version 8.4
    • The final SELECT gets one row for uneven numbers and avg() of two rows for even numbers. Result is numeric, rounded to 3 decimal places.

    Test shows, that the new version is 4x faster than (and yields correct results, unlike) the query in the question:

    CREATE TEMP TABLE x (value int);
    INSERT INTO x SELECT generate_series(1,10000);
    INSERT INTO x VALUES (NULL),(NULL),(NULL),(3);
    
    0 讨论(0)
  • 2021-02-02 16:23

    Yes, with PostgreSQL 9.4, you can use the newly introduced inverse distribution function PERCENTILE_CONT(), an ordered-set aggregate function that is specified in the SQL standard as well.

    WITH t(value) AS (
      SELECT 1   UNION ALL
      SELECT 2   UNION ALL
      SELECT 100 
    )
    SELECT
      percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
    FROM
      t;
    

    This emulation of MEDIAN() via PERCENTILE_CONT() is also documented here.

    0 讨论(0)
提交回复
热议问题