How to improve the performance of timescaledb getting last timestamp

问题

SELECT timeseries_id, "timestamp" FROM enhydris_timeseriesrecord WHERE timeseries_id=6661 ORDER BY "timestamp" DESC LIMIT 1;

(The table contains about 66m records, and the ones with timeseries_id=6661 are about 0.5m.)

This query takes about 1-2 seconds to run, which I find too much.

If it was using a simple btree index, it should locate what it's looking for after about 30 iterations. As far as I can see when I execute EXPLAIN ANALYZE for that query, it does use the index, but it has to do so in each chunk, and apparently there are 1374 chunks.

How can the query become faster?

                 Table "public.enhydris_timeseriesrecord"
    Column     |           Type           | Collation | Nullable | Default 
---------------+--------------------------+-----------+----------+---------
 timeseries_id | integer                  |           | not null | 
 timestamp     | timestamp with time zone |           | not null | 
 value         | double precision         |           |          | 
 flags         | character varying(237)   |           | not null | 
Indexes:
    "enhydris_timeseriesrecord_pk" PRIMARY KEY, btree (timeseries_id, "timestamp")
    "enhydris_timeseriesrecord_timeseries_id_idx" btree (timeseries_id)
    "enhydris_timeseriesrecord_timestamp_idx" btree ("timestamp" DESC)
    "enhydris_timeseriesrecord_timestamp_timeseries_id_idx" btree ("timestamp", timeseries_id)
Foreign-key constraints:
    "enhydris_timeseriesrecord_timeseries_fk" FOREIGN KEY (timeseries_id) REFERENCES enhydris_timeseries(id) DEFERRABLE INITIALLY DEFERRED
Triggers:
    ts_insert_blocker BEFORE INSERT ON enhydris_timeseriesrecord FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()
Number of child tables: 1374 (Use \d+ to list them.)

Update: EXPLAIN plan

回答1:

The database has to go to the subindexes of each chunk and retrieve find which is the latest timestamp for timeseries_id=x. The database correctly uses the index (as you can see from the explain) it does an index scan, not a full scan, of each sub-index in each of the chunks. So it does >1000 index scans. No chunks can be pruned because the planner can't know which chunks have the entries for that specific timeseries_id.

And you have 1300 chunks for only 66m records -> ~50k rows per chunk. That's too few rows per chunk. From the Timescale Docs they have the following recommendations:

The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space partitions) fit into memory. As such, we typically recommend setting the interval so that these chunk(s) comprise no more than 25% of main memory.

https://docs.timescale.com/latest/using-timescaledb/hypertables#best-practices

Reducing the number of chunks will significantly improve the query performance.

Additionally you might gain even more query performance if you utilize TimescaleDB compression, which will reduce the number of chunks required to be scanned even more, you could segment by the timeseries_id (https://docs.timescale.com/latest/api#compression) Or you could define a continuous aggregate that will hold the last item per timeseries_id (https://docs.timescale.com/latest/api#continuous-aggregates)

来源：https://stackoverflow.com/questions/61191608/how-to-improve-the-performance-of-timescaledb-getting-last-timestamp

标签

timescaledb