How to get date_part query to hit index?

后端 未结 1 448
天涯浪人
天涯浪人 2021-01-23 12:06

I have yet to be able to get this query to hit an index instead of performing a full scan - I have another query that uses date_part(\'day\', datelocal) against an almost identi

1条回答
  •  一个人的身影
    2021-01-23 12:41

    Well, both your queries are on different tables (reportimpression vs. reportimpressionday), so the comparison of the two queries really isn't a comparison. Did you ANALYZE both? Various column statistics also may play a role. Index or table bloat may be different. Does a larger part of all rows qualify for Feb 2019? Etc.

    One shot in the dark, compare the percentages for both tables:

    SELECT tbl, round(share * 100 / total, 2) As percentage
    FROM  (
       SELECT text 'reportimpression' AS tbl
            , count(*)::numeric AS total
            , count(*) FILTER (WHERE datelocal >= '2019-02-01' AND datelocal < '2019-03-01')::numeric AS share
       FROM  reportimpression
    
       UNION ALL
       SELECT 'reportimpressionday'
            , count(*)
            , count(*) FILTER (WHERE datelocal >= '2019-02-01' AND datelocal < '2019-03-01')
       FROM  reportimpressionday
      ) sub;
    

    Is the one for reportimpression bigger? Then it might just exceed the number for which an index is expected to help.

    Generally, your index reportimpression_datelocal_index on (datelocal) looks good for it, and reportimpression_viewership_index even allows index-only scans if autovacuum beats the write load on the table. (Though impressions & agegroup are just dead freight for this and it would work even better without).

    Answer

    You got 26.6 percent, and day is 26.4 percent for my query. For such a large percentage, indexes are typically not useful at all. A sequential scan is typically the fastest way. Only index-only scans may still make sense if the underlying table is much bigger. (Or you have severe table bloat, and less bloated indexes, which makes indexes more attractive again.)

    Your first query may just be across the tipping point. Try narrowing the time frame until you see index-only scans. You won't see (bitmap) index scans with more then roughly 5 % of all rows qualifying (depends on many factors).

    Queries

    Be that as it may, consider these modified queries:

    SELECT date_part('hour', datelocal)                AS hour
         , SUM(views) FILTER (WHERE gender = 'male')   AS male
         , SUM(views) FILTER (WHERE gender = 'female') AS female
    FROM   reportimpression
    WHERE  datelocal >= '2019-02-01'
    AND    datelocal <  '2019-03-01' -- '2019-02-28'  -- ?
    GROUP  BY 1
    ORDER  BY 1;
    
    SELECT date_trunc('day', datelocal)                AS day
         , SUM(views) FILTER (WHERE gender = 'male')   AS male
         , SUM(views) FILTER (WHERE gender = 'female') AS female
    FROM   reportimpressionday
    WHERE  datelocal >= '2019-02-01'
    AND    datelocal <  '2019-03-01'
    GROUP  BY 1
    ORDER  BY 1;
    

    Major points

    • When using localized date format like '2-1-2019', go through to_timestamp() with explicit format specifiers. Else this depends on locale settings and might break (silently) when called from a session with different settings. Rather use ISO date / time formats as demonstrated which do not depend on locale settings.

    • Looks like you want to include the whole month of February. But your query misses out on the upper bound. For one, February may have 29 days. An datelocal < '2-28-2019' excludes all of Feb 28 as well. Use datelocal < '2019-03-01' instead.

    • It's cheaper to group & sort by the same expression as you have in the SELECT list if you can. So use date_trunc() there, too. Don't use different expressions without need. If you need the datepart in the result, apply it on the grouped expression, like:

      SELECT date_part('day', date_trunc('day', datelocal)) AS day
      ...
      GROUP  BY date_trunc('day', datelocal)
      ORDER  BY date_trunc('day', datelocal);
      

      A bit more noisy code, but faster (and possibly easier to optimize for the query planner, too).

    • Use the aggregate FILTER clause in Postgres 9.4 or later. It's cleaner and a bit faster. See:

      • How can I simplify this game statistics query?
      • For absolute performance, is SUM faster or COUNT?

    0 讨论(0)
提交回复
热议问题