发表新帖

发表新帖

Best performance in sampling repeated value from a grouped column

前端未结

关注

 2  1835

一向 2021-02-13 02:02

This question is about the functionality of first_value(), using another function or workaround.

It is also about \"little gain in performance\" in big tables. To use eg

2条回答

后悔当初 (楼主)

2021-02-13 02:32
If you really don't care which member of the set is picked, and if you don't need to compute additional aggregates (like count), there is a fast and simple alternative with DISTINCT ON (x) without ORDER BY:
```
SELECT DISTINCT ON (x) x, y, z FROM t;
```
x, y and z are from the same row, but the row is an arbitrary pick from each set of rows with the same x.

If you need a count anyway, your options with regard to performance are limited since the whole table has to be read in either case. Still, you can combine it with window functions in the same SELECT:
```
SELECT DISTINCT ON (x) x, y, z, count(*) OVER (PARTITION BY x) AS x_count FROM t;
```
Consider the sequence of events in a SELECT query:
- Best way to get result count before LIMIT was applied
Depending on requirements, there may be faster ways to get counts:
- Fast way to discover the row count of a table in PostgreSQL
In combination with GROUP BY the only realistic option I see to gain some performance is the first_last_agg extension. But don't expect much.

For other use cases without count (including the simple case at the top), there are faster solutions, depending on your exact use case. In particular to get "first" or "last" value of each set. Emulate a loose index scan. (Like @Mihai commented):
- Optimize GROUP BY query to retrieve latest record per user
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题