Postgres Slow group by query with max

前端 未结 3 1636
南旧
南旧 2021-01-14 08:37

I am using postgres 9.1 and I have a table with about 3.5M rows of eventtype (varchar) and eventtime (timestamp) - and some other fields. There are only about 20 different

相关标签:
3条回答
  • 2021-01-14 08:49

    index on (eventtype, eventtime desc) should help. or reindex on primary key index. I would also recommend replace type of eventtype to enum (if number of types is fixed) or int/smallint. This will decrease size of data and indexes so queries will run faster.

    0 讨论(0)
  • 2021-01-14 08:59

    What you need is a "skip scan" or "loose index scan". PostgreSQL's planner does not yet implement those automatically, but you can trick it into using one by using a recursive query.

    WITH RECURSIVE  t AS (
    SELECT min(eventtype) AS eventtype FROM allevents
               UNION ALL
    SELECT (SELECT min(eventtype) as eventtype FROM allevents WHERE eventtype > t.eventtype)
       FROM t where t.eventtype is not null
    )
    select eventtype, (select max(eventtime) from allevents where eventtype=t.eventtype) from t;
    

    There may be a way to collapse the max(eventtime) into the recursive query rather than doing it outside that query, but if so I have not hit upon it.

    This needs an index on (eventtype, eventtime) in order to be efficient. You can have it be DESC on the eventtime, but that is not necessary. This is efficiently only if eventtype has only a few distinct values (21 of them, in your case).

    0 讨论(0)
  • 2021-01-14 09:01

    Based on the question you already have the relevant index.

    If upgrading to Postgres 9.3 or an index on (eventtype, eventtime desc) doesn't make a difference, this is a case where rewriting the query so it uses a correlated subquery works very well if you can enumerate all of the event types manually:

    select val as eventtype,
           (select max(eventtime)
            from allevents
            where allevents.eventtype = val
            ) as eventtime
    from unnest('{type1,type2,…}'::text[]) as val;
    

    Here's the plans I get when running similar queries:

    denis=# select version();
                                                                  version                                                              
    -----------------------------------------------------------------------------------------------------------------------------------
     PostgreSQL 9.3.1 on x86_64-apple-darwin11.4.2, compiled by Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn), 64-bit
    (1 row)
    

    Test data:

    denis=# create table test (evttype int, evttime timestamp, primary key (evttype, evttime));
    CREATE TABLE
    denis=# insert into test (evttype, evttime) select i, now() + (i % 3) * interval '1 min' - j * interval '1 sec' from generate_series(1,10) i, generate_series(1,10000) j;
    INSERT 0 100000
    denis=# create index on test (evttime, evttype);
    CREATE INDEX
    denis=# vacuum analyze test;
    VACUUM
    

    First query:

    denis=# explain analyze select evttype, max(evttime) from test group by evttype;                                                    QUERY PLAN                                                     
    -------------------------------------------------------------------------------------------------------------------
     HashAggregate  (cost=2041.00..2041.10 rows=10 width=12) (actual time=54.983..54.987 rows=10 loops=1)
       ->  Seq Scan on test  (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.009..15.954 rows=100000 loops=1)
     Total runtime: 55.045 ms
    (3 rows)
    

    Second query:

    denis=# explain analyze select val as evttype, (select max(evttime) from test where test.evttype = val) as evttime from unnest('{1,2,3,4,5,6,7,8,9,10}'::int[]) val;
                                                                            QUERY PLAN                                                                         
    -----------------------------------------------------------------------------------------------------------------------------------------------------------
     Function Scan on unnest val  (cost=0.00..48.39 rows=100 width=4) (actual time=0.086..0.292 rows=10 loops=1)
       SubPlan 2
         ->  Result  (cost=0.46..0.47 rows=1 width=0) (actual time=0.024..0.024 rows=1 loops=10)
               InitPlan 1 (returns $1)
                 ->  Limit  (cost=0.42..0.46 rows=1 width=8) (actual time=0.021..0.021 rows=1 loops=10)
                       ->  Index Only Scan Backward using test_pkey on test  (cost=0.42..464.42 rows=10000 width=8) (actual time=0.019..0.019 rows=1 loops=10)
                             Index Cond: ((evttype = val.val) AND (evttime IS NOT NULL))
                             Heap Fetches: 0
     Total runtime: 0.370 ms
    (9 rows)
    
    0 讨论(0)
提交回复
热议问题