Slow LEFT JOIN on CTE with time intervals

∥☆過路亽.° 提交于 2019-12-02 01:07:31

Correctness first: I suspect a bug in your query:

 LEFT JOIN historical_ohlcv ohlcv ON ohlcv.time_open >= g.start_time
                                 AND ohlcv.time_close < g.end_time

Unlike my referenced answer, you join on a time interval: (time_open, time_close]. The way you do it excludes rows in the table where the interval crosses bucket borders. Only intervals fully contained in a single bucket count. I don't think that's intended?

A simple fix would be to decide bucket membership based on time_open (or time_close) alone. If you want to keep working with both, you have to define exactly how to deal with intervals overlapping with multiple buckets.

Also, you are looking for max(high) per bucket, which is different in nature from count(*) in my referenced answer.

And your buckets are simple intervals per hour?

Then we can radically simplify. Working with just time_open:

SELECT date_trunc('hour', time_open) AS hour, max(high) AS max_high
FROM   historical_ohlcv
WHERE  exchange_symbol = 'BINANCE'
AND    symbol_id = 'ETHBTC'
AND    time_open >= now() - interval '5 months'  -- frame_start
AND    time_open <  now()                        -- frame_end
GROUP  BY 1
ORDER  BY 1;

Related:

It's hard to talk about further performance optimization while basics are unclear. And we'd need more information.

Are WHERE conditions variable?
How many distinct values in exchange_symbol and symbol_id?
Avg. row size? What do you get for:

SELECT avg(pg_column_size(t)) FROM historical_ohlcv t TABLESAMPLE SYSTEM (0.1);

Is the table read-only?

Assuming you always filter on exchange_symbol and symbol_id and values are variable, your table is read-only or autovacuum can keep up with the write load so we can hope for index-only scans, you would best have a multicolumn index on (exchange_symbol, symbol_id, time_open, high DESC) to support this query. Index columns in this order. Related:

Depending on data distribution and other details a LEFT JOIN LATERAL solution might be another option. Related:

Aside from all that, you EXPLAIN plan exhibits some very bad estimates:

Are you using a current version of Postgres? You may have to work on your server configuration - or at least set higher statistics targets on relevant columns and more aggressive autovacuum settings for the big table. Related:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!