I\'ve been surprisingly unable to find an nth percentile function for postgresql.
I am using this via mondrian olap tool so i just need an aggregate function which r
The ntile
function is very useful here. I have a table test_temp
:
select * from test_temp
score
integer
3
5
2
10
4
8
7
12
select score, ntile(4) over (order by score) as quartile from test_temp;
score quartile
integer integer
2 1
3 1
4 2
5 2
7 3
8 3
10 4
12 4
ntile(4) over (order by score)
orders the columns by score, splits it into four even groups (if the number divides evenly) and assigns the group number based on the order.
Since I have 8 numbers here, they represent the 0th, 12.5th, 25th, 37.5th, 50th, 62.5th, 75th and 87.5th percentiles. So if I only take the results where the quartile
is 2, I'll have the 25th and 37.5th percentiles.
with ranked_test as (
select score, ntile(4) over (order by score) as quartile from temp_test
)
select min(score) from ranked_test
where quartile = 2
group by quartile;
returns 4
, the third highest number on the list of 8.
If you had a larger table and used ntile(100)
the column you filter on would be the percentile, and you could use the same query as above.
With PostgreSQL 9.4 there is native support for percentiles now, implemented in Ordered-Set Aggregate Functions:
percentile_cont(fraction) WITHIN GROUP (ORDER BY sort_expression)
continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed
percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression)
multiple continuous percentile: returns an array of results matching the shape of the fractions parameter, with each non-null element replaced by the value corresponding to that percentile
See the documentation for more details: http://www.postgresql.org/docs/current/static/functions-aggregate.html
and see here for some examples: https://github.com/michaelpq/michaelpq.github.io/blob/master/_posts/2014-02-27-postgres-9-4-feature-highlight-within-group.markdown
CREATE TABLE aa AS SELECT generate_series(1,20) AS a;
--SELECT 20
WITH subset AS (
SELECT a AS val,
ntile(4) OVER (ORDER BY a) AS tile
FROM aa
)
SELECT tile, max(val)
FROM subset GROUP BY tile ORDER BY tile;
tile | max
------+-----
1 | 5
2 | 10
3 | 15
4 | 20
(4 rows)