Slow LEFT JOIN on CTE with time intervals

I am trying to debug a query in PostgreSQL that I've built to bucket market data in time buckets in arbitrary time intervals. Here is my table definition:

CREATE TABLE historical_ohlcv (
  exchange_symbol TEXT                     NOT NULL,
  symbol_id       TEXT                     NOT NULL,
  kafka_key       TEXT                     NOT NULL,
  open            NUMERIC,
  high            NUMERIC,
  low             NUMERIC,
  close           NUMERIC,
  volume          NUMERIC,
  time_open       TIMESTAMP WITH TIME ZONE NOT NULL,
  time_close      TIMESTAMP WITH TIME ZONE,
  CONSTRAINT historical_ohlcv_pkey
  PRIMARY KEY (exchange_symbol, symbol_id, time_open)
);

CREATE INDEX symbol_id_idx
  ON historical_ohlcv (symbol_id);

CREATE INDEX open_close_symbol_id
  ON historical_ohlcv (time_open, time_close, exchange_symbol, symbol_id);

CREATE INDEX time_open_idx
  ON historical_ohlcv (time_open);

CREATE INDEX time_close_idx
  ON historical_ohlcv (time_close);

The table has ~25m rows currently. My query as an example for 1 hour, but could be 5 mins, 10 mins, 2 days, etc.

EXPLAIN ANALYZE WITH vals AS (
    SELECT
      NOW() - '5 months' :: INTERVAL AS frame_start,
      NOW() AS frame_end,
      INTERVAL '1 hour'        AS t_interval
)
  , grid AS (
      SELECT
        start_time,
        lead(start_time, 1)
        OVER (
          ORDER BY start_time ) AS end_time
      FROM (
             SELECT
               generate_series(frame_start, frame_end,
                               t_interval) AS start_time,
               frame_end
             FROM vals
           ) AS x
  )
SELECT max(high)
FROM grid g
  LEFT JOIN historical_ohlcv ohlcv ON ohlcv.time_open >= g.start_time
WHERE exchange_symbol = 'BINANCE'
AND symbol_id = 'ETHBTC'
GROUP BY start_time;

The WHERE clause could be any valid value in the table.

This technique was inspired by:

Best way to count records by arbitrary time intervals in Rails+Postgres.

The idea is to make a common table and left join your data with that to indicate which bucket stuff is in. This query is really slow! It's currently taking 15s. Based on the query planner, we have a really expensive nested loop:

QUERY PLAN
HashAggregate  (cost=2758432.05..2758434.05 rows=200 width=40) (actual time=16023.713..16023.817 rows=542 loops=1)
  Group Key: g.start_time
  CTE vals
    ->  Result  (cost=0.00..0.02 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
  CTE grid
    ->  WindowAgg  (cost=64.86..82.36 rows=1000 width=16) (actual time=2.986..9.594 rows=3625 loops=1)
          ->  Sort  (cost=64.86..67.36 rows=1000 width=8) (actual time=2.981..4.014 rows=3625 loops=1)
                Sort Key: x.start_time
                Sort Method: quicksort  Memory: 266kB
                ->  Subquery Scan on x  (cost=0.00..15.03 rows=1000 width=8) (actual time=0.014..1.991 rows=3625 loops=1)
                      ->  ProjectSet  (cost=0.00..5.03 rows=1000 width=16) (actual time=0.013..1.048 rows=3625 loops=1)
                            ->  CTE Scan on vals  (cost=0.00..0.02 rows=1 width=32) (actual time=0.008..0.009 rows=1 loops=1)
  ->  Nested Loop  (cost=0.56..2694021.34 rows=12865667 width=14) (actual time=7051.730..16015.873 rows=31978 loops=1)
        ->  CTE Scan on grid g  (cost=0.00..20.00 rows=1000 width=16) (actual time=2.988..11.635 rows=3625 loops=1)
        ->  Index Scan using historical_ohlcv_pkey on historical_ohlcv ohlcv  (cost=0.56..2565.34 rows=12866 width=22) (actual time=3.712..4.413 rows=9 loops=3625)
              Index Cond: ((exchange_symbol = 'BINANCE'::text) AND (symbol_id = 'ETHBTC'::text) AND (time_open >= g.start_time))
              Filter: (time_close < g.end_time)
              Rows Removed by Filter: 15502
Planning time: 0.568 ms
Execution time: 16023.979 ms

My guess is this line is doing a lot:

LEFT JOIN historical_ohlcv ohlcv ON ohlcv.time_open >= g.start_time
                                AND ohlcv.time_close < g.end_time

But I'm not sure how to accomplish this in another way.

P.S. apologies if this belongs to dba.SE. I read the FAQ and this seemed too basic for that site, so I posted here.

Edits as requested:

SELECT avg(pg_column_size(t)) FROM historical_ohlcv t TABLESAMPLE SYSTEM (0.1); returns 107.632

For exchange_symbol, there are 3 unique values, for symbol_id there are ~400

PostgreSQL version: PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit.

The table will be growing about ~1m records a day, so not exactly read-only. All this stuff is done locally and I will try to move to RDS or to help manage hardware issues.

Related: if I wanted to add other aggregates, specifically 'first in the bucket', 'last in the bucket', min, sum, would my indexing strategy change?

Correctness first: I suspect a bug in your query:

 LEFT JOIN historical_ohlcv ohlcv ON ohlcv.time_open >= g.start_time
                                 AND ohlcv.time_close < g.end_time

Unlike my referenced answer, you join on a time interval: (time_open, time_close]. The way you do it excludes rows in the table where the interval crosses bucket borders. Only intervals fully contained in a single bucket count. I don't think that's intended?

A simple fix would be to decide bucket membership based on time_open (or time_close) alone. If you want to keep working with both, you have to define exactly how to deal with intervals overlapping with multiple buckets.

Also, you are looking for max(high) per bucket, which is different in nature from count(*) in my referenced answer.

And your buckets are simple intervals per hour?

Then we can radically simplify. Working with just time_open:

SELECT date_trunc('hour', time_open) AS hour, max(high) AS max_high
FROM   historical_ohlcv
WHERE  exchange_symbol = 'BINANCE'
AND    symbol_id = 'ETHBTC'
AND    time_open >= now() - interval '5 months'  -- frame_start
AND    time_open <  now()                        -- frame_end
GROUP  BY 1
ORDER  BY 1;

Resample on time series data

It's hard to talk about further performance optimization while basics are unclear. And we'd need more information.

Are WHERE conditions variable?
How many distinct values in exchange_symbol and symbol_id?
Avg. row size? What do you get for:

SELECT avg(pg_column_size(t)) FROM historical_ohlcv t TABLESAMPLE SYSTEM (0.1);

Is the table read-only?

Assuming you always filter on exchange_symbol and symbol_id and values are variable, your table is read-only or autovacuum can keep up with the write load so we can hope for index-only scans, you would best have a multicolumn index on (exchange_symbol, symbol_id, time_open, high DESC) to support this query. Index columns in this order. Related:

Multicolumn index and performance

Depending on data distribution and other details a LEFT JOIN LATERAL solution might be another option. Related:

Aside from all that, you EXPLAIN plan exhibits some very bad estimates:

https://explain.depesz.com/s/E5yI

Are you using a current version of Postgres? You may have to work on your server configuration - or at least set higher statistics targets on relevant columns and more aggressive autovacuum settings for the big table. Related:

来源：https://stackoverflow.com/questions/50221842/slow-left-join-on-cte-with-time-intervals

标签

sql

postgresql

aggregate-functions

postgresql-performance