Is there a better way to calculate the median (not average)

前端未结

关注

 7  846

Suppose I have the following table definition:

CREATE TABLE x (i serial primary key, value integer not null);

I want to calculate the MEDIAN o

相关标签:

7条回答

误落风尘

2021-02-02 15:58

Simple sql with native postgres functions only:

select 
    case count(*)%2
        when 1 then (array_agg(num order by num))[count(*)/2+1]
        else ((array_agg(num order by num))[count(*)/2]::double precision + (array_agg(num order by num))[count(*)/2+1])/2
    end as median
from unnest(array[5,17,83,27,28]) num;

Sure you can add coalesce() or something if you want to handle nulls.

0 讨论(0)

离开以前

2021-02-02 15:59

CREATE TABLE array_table (id integer, values integer[]) ;

INSERT INTO array_table VALUES ( 1,'{1,2,3}');
INSERT INTO array_table VALUES ( 2,'{4,5,6,7}');

select id, values, cardinality(values) as array_length,
(case when cardinality(values)%2=0 and cardinality(values)>1 then (values[(cardinality(values)/2)]+ values[((cardinality(values)/2)+1)])/2::float 
 else values[(cardinality(values)+1)/2]::float end) as median  
 from array_table

Or you can create a function and use it any where in your further queries.

CREATE OR REPLACE FUNCTION median (a integer[]) 
RETURNS float AS    $median$ 
Declare     
    abc float; 
BEGIN    
    SELECT (case when cardinality(a)%2=0 and cardinality(a)>1 then 
           (a[(cardinality(a)/2)] + a[((cardinality(a)/2)+1)])/2::float   
           else a[(cardinality(a)+1)/2]::float end) into abc;    
    RETURN abc; 
END;    
$median$ 
LANGUAGE plpgsql;

select id,values,median(values) from array_table

0 讨论(0)

抹茶落季

2021-02-02 16:07
Use the Below function for Finding nth percentile
```
CREATE or REPLACE FUNCTION nth_percentil(anyarray, int)
    RETURNS 
        anyelement as 
    $$
        SELECT $1[$2/100.0 * array_upper($1,1) + 1] ;
    $$ 
LANGUAGE SQL IMMUTABLE STRICT;
```
In Your case it's 50th Percentile.

Use the Below Query to get the Median
```
SELECT nth_percentil(ARRAY (SELECT Field_name FROM table_name ORDER BY 1),50)
```
This will give you 50th percentile which is the median basically.

Hope this is helpful.
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2021-02-02 16:12

Indeed there IS an easier way. In Postgres you can define your own aggregate functions. I posted functions to do median as well as mode and range to the PostgreSQL snippets library a while back.

http://wiki.postgresql.org/wiki/Aggregate_Median

0 讨论(0)
发布评论:

提交评论
- 加载中...

Happy的楠姐

2021-02-02 16:20

A simpler query for that:

WITH y AS (
   SELECT value, row_number() OVER (ORDER BY value) AS rn
   FROM   x
   WHERE  value IS NOT NULL
   )
, c AS (SELECT count(*) AS ct FROM y) 
SELECT CASE WHEN c.ct%2 = 0 THEN
          round((SELECT avg(value) FROM y WHERE y.rn IN (c.ct/2, c.ct/2+1)), 3)
       ELSE
                (SELECT     value  FROM y WHERE y.rn = (c.ct+1)/2)
       END AS median
FROM   c;

Major points

Ignores NULL values.
Core feature is the row_number() window function, which has been there since version 8.4
The final SELECT gets one row for uneven numbers and avg() of two rows for even numbers. Result is numeric, rounded to 3 decimal places.

Test shows, that the new version is 4x faster than (and yields correct results, unlike) the query in the question:

CREATE TEMP TABLE x (value int);
INSERT INTO x SELECT generate_series(1,10000);
INSERT INTO x VALUES (NULL),(NULL),(NULL),(3);

0 讨论(0)

余生分开走

2021-02-02 16:23
Yes, with PostgreSQL 9.4, you can use the newly introduced inverse distribution function PERCENTILE_CONT(), an ordered-set aggregate function that is specified in the SQL standard as well.
```
WITH t(value) AS (
  SELECT 1   UNION ALL
  SELECT 2   UNION ALL
  SELECT 100 
)
SELECT
  percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
FROM
  t;
```
This emulation of MEDIAN() via PERCENTILE_CONT() is also documented here.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页