How do I improve date-based query performance on a large table?

倾然丶 夕夏残阳落幕 提交于 2019-12-01 10:38:06

A materialized view is the way to go for what you outlined. Querying past months of read-only data works without refreshing it. You may want to special-case the current month if you need to cover that, too.

The underlying query can still benefit from an index, and there are two directions you might take:

First off, partial indexes like you have now won't buy much in your scenario, not worth it. If you collect many more months of data and mostly query by month (and add / drop rows by month) table partitioning might be an idea, then you have your indexes partitioned automatically, too. I'd consider Postgres 11 or even the upcoming Postgres 12 for this, though.)

If your rows are wide, create an index that allows index-only scans. Like:

CREATE INDEX reportimpression_covering_idx ON reportimpression(datelocal, views, gender);

Related:

Or INCLUDE additional columns in Postgres 11 or later:

CREATE INDEX reportimpression_covering_idx ON reportimpression(datelocal) INCLUDE (views, gender);

Else, if your rows are physically sorted by datelocal, consider a BRIN index. It's extremely small and probably about as fast as a B-tree index for your case. (But being so small it will stay cached much easier and not push other data out as much.)

CREATE INDEX reportimpression_brin_idx ON reportimpression USING BRIN (datelocal);

You may be interested in CLUSTER or pg_repack to physically sort table rows. pg_repack can do it without exclusive locks on the table and even without a btree index (required by CLUSTER). But it's an additional module not shipped with the standard distribution of Postgres.

Related:

Your execution plan seems to be doing the right thing.

Things you can do to improve, in descending order of effectiveness:

  • Use a materialized view that pre-aggregates the data

  • Don't use a hosted database, use your own iron with good local storage and lots of RAM.

  • Use only one index instead of several partitioned ones. This is not primarily a performance advice (the query will probably not be measurably slower unless you have a lot of indexes), but it will ease the management burden.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!