Optimize Postgres timestamp query range

后端 未结 1 1391
伪装坚强ぢ
伪装坚强ぢ 2020-11-28 13:06

I have the following table and indices defined:

CREATE TABLE ticket
(
  wid bigint NOT NULL DEFAULT nextval(\'tickets_id_seq\'::regclass),
  eid bigint,
  cr         


        
相关标签:
1条回答
  • 2020-11-28 13:28

    CLUSTER

    If you intend to use CLUSTER, the displayed syntax is invalid.

    create CLUSTER ticket USING ticket_1_idx;

    Run once:

    CLUSTER ticket USING ticket_1_idx;
    

    This can help a lot with bigger result sets. Not so much for a single row returned.
    Postgres remembers which index to use for subsequent calls. If your table isn't read-only the effect deteriorates over time and you need to re-run at certain intervals:

    CLUSTER ticket;
    

    Possibly only on volatile partitions. See below.

    However, if you have lots of updates, CLUSTER (or VACUUM FULL) may actually be bad for performance. The right amount of bloat allows UPDATE to place new row versions on the same data page and avoids the need for physically extending the underlying file in the OS too often. You can use a carefully tuned FILLFACTOR to get the best of both worlds:

    • Fill factor for a sequential index that is PK

    pg_repack

    CLUSTER takes an exclusive lock on the table, which may be a problem in a multi-user environment. Quoting the manual:

    When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.

    Bold emphasis mine. Consider the alternative pg_repack:

    Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

    and:

    pg_repack needs to take an exclusive lock at the end of the reorganization.

    Version 1.3.1 works with:

    PostgreSQL 8.3, 8.4, 9.0, 9.1, 9.2, 9.3, 9.4

    Version 1.4.2 works with:

    PostgreSQL 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 10

    Query

    The query is simple enough not to cause any performance problems per se.

    However, a word on correctness: The BETWEEN construct includes borders. Your query selects all of Dec. 19, plus records from Dec. 20, 00:00 hours. That's an extremely unlikely requirement. Chances are, you really want:

    SELECT *
    FROM   ticket 
    WHERE  created >= '2012-12-19 0:0'
    AND    created <  '2012-12-20 0:0';
    

    Performance

    First off, you ask:

    Why is it selecting sequential scan?

    Your EXPLAIN output clearly shows an Index Scan, not a sequential table scan. There must be some kind of misunderstanding.

    If you are pressed hard for better performance, you may be able to improve things. But the necessary background information is not in the question. Possible options include:

    • You could only query required columns instead of * to reduce transfer cost (and possibly other performance benefits).

    • You could look at partitioning and put practical time slices into separate tables. Add indexes to partitions as needed.

    • If partitioning is not an option, another related but less intrusive technique would be to add one or more partial indexes.
      For example, if you mostly query the current month, you could create the following partial index:

      CREATE INDEX ticket_created_idx ON ticket(created)
      WHERE created >= '2012-12-01 00:00:00'::timestamp;
      

      CREATE a new index right before the start of a new month. You can easily automate the task with a cron job. Optionally DROP partial indexes for old months later.

    • Keep the total index in addition for CLUSTER (which cannot operate on partial indexes). If old records never change, table partitioning would help this task a lot, since you only need to re-cluster newer partitions. Then again if records never change at all, you probably don't need CLUSTER.

    If you combine the last two steps, performance should be awesome.

    Performance Basics

    You may be missing one of the basics. All the usual performance advice applies:

    • https://wiki.postgresql.org/wiki/Slow_Query_Questions
    • https://wiki.postgresql.org/wiki/Performance_Optimization
    0 讨论(0)
提交回复
热议问题