Bitmap Heap Scan performance

℡╲_俬逩灬. 提交于 2019-12-01 01:12:14
Quassnoi

Create a composite index on (foo_id, foo, date) (in this order).

Note that if you select 500k records (and return them all to the client), this may take long.

Are you sure you need all 500k records on the client (rather than some kind of an aggregate or a LIMIT)?

Erwin Brandstetter

Answer to comment

Do i need the where columns in the same order of the index?

The order of expressions in the WHERE clause is completely irrelevant, SQL is not a procedural language.

Fix mistakes

The timestamp column should not be named "date" for several reasons. Obviously, it's a timestamp, not a date. But more importantly, date it is a reserved word in all SQL standards and a type and function name in Postgres and shouldn't be used as identifier.

You should provide proper information with your question, including a complete table definition and conclusive information about existing indexes. I might be a good idea to start by reading the chapter about indexes in the manual.

The WHERE conditions on the timestamp are most probably incorrect:

and date >= '2013-04-04'
and date <= '2013-05-05'

The upper border for a timestamp column should probably be excluded:

and date >= '2013-04-04'
and date <  '2013-05-05'

Index

With the multicolumn index @Quassnoi provided, your query will be much faster, since all qualifying rows can be read from one continuous data block of the index. No row is read in vain (and later disqualified), like you have it now.
But 500k rows will still take some time. Normally you have to verify visibility and fetch additional columns from the table. An index-only scan might be an option in Postgres 9.2+.

The order of columns is best this way, because the rule of thumb is: columns for equality first — then for ranges. More explanation and links in this related answer on dba.SE.

CLUSTER / pg_repack

You could further speed things up by streamlining the table according to this index, so that a minimum of blocks have to be read from the table - if you don't have other requirements that stand against it!

If you want it faster, yet, you could streamline the physical order of rows in your table. If you can afford to lock your table exclusively for a few seconds (at off hours for instance) to rewrite your table and order rows according to the index:

ALTER TABLE foo_table CLUSTER ON idx_myindex_idx;

If concurrent use is a problem, consider pg_repack, which can do the same without exclusive lock.

The effect: fewer blocks need to be read from the table and everything is pre-sorted. It's a one-time effect deteriorating over time, if you have writes on the table. So you would rerun it from time to time.

I copied and adapted the last chapter from this related answer on dba.SE.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!