Bitmap Heap Scan performance

I have a big report table. Bitmap Heap Scan step take more than 5 sec.

Is there something that I can do? I add columns to the table, does reindex the index that it use will help?

I do union and sum on the data, so I don't return 500K records to the client.
I use postgres 9.1.
Here the explain:

 Bitmap Heap Scan on foo_table  (cost=24747.45..1339408.81 rows=473986 width=116) (actual time=422.210..5918.037 rows=495747 loops=1)
   Recheck Cond: ((foo_id = 72) AND (date >= '2013-04-04 00:00:00'::timestamp without time zone) AND (date <= '2013-05-05 00:00:00'::timestamp without time zone))
   Filter: ((foo)::text = 'foooooo'::text)
   ->  Bitmap Index Scan on foo_table_idx  (cost=0.00..24628.96 rows=573023 width=0) (actual time=341.269..341.269 rows=723918 loops=1)

Query:

explain analyze
SELECT CAST(date as date) AS date, foo_id, ....
from foo_table
where foo_id = 72
and date >= '2013-04-04'
and date <= '2013-05-05'
and foo = 'foooooo'

Index def:
Index "public.foo_table_idx"
   Column    |            Type
-------------+-----------------------------
 foo_id      | bigint
 date        | timestamp without time zone

 btree, for table "public.external_channel_report"

Table:
foo is text field with 4 different values.
foo_id is bigint with currently 10K distinct values.

Quassnoi

Create a composite index on (foo_id, foo, date) (in this order).

Note that if you select 500k records (and return them all to the client), this may take long.

Are you sure you need all 500k records on the client (rather than some kind of an aggregate or a LIMIT)?

Erwin Brandstetter

Answer to comment

Do i need the where columns in the same order of the index?

The order of expressions in the WHERE clause is completely irrelevant, SQL is not a procedural language.

Fix mistakes

The timestamp column should not be named "date" for several reasons. Obviously, it's a timestamp, not a date. But more importantly, date it is a reserved word in all SQL standards and a type and function name in Postgres and shouldn't be used as identifier.

You should provide proper information with your question, including a complete table definition and conclusive information about existing indexes. I might be a good idea to start by reading the chapter about indexes in the manual.

The WHERE conditions on the timestamp are most probably incorrect:

and date >= '2013-04-04'
and date <= '2013-05-05'

The upper border for a timestamp column should probably be excluded:

and date >= '2013-04-04'
and date <  '2013-05-05'

Index

With the multicolumn index @Quassnoi provided, your query will be much faster, since all qualifying rows can be read from one continuous data block of the index. No row is read in vain (and later disqualified), like you have it now.
But 500k rows will still take some time. Normally you have to verify visibility and fetch additional columns from the table. An index-only scan might be an option in Postgres 9.2+.

The order of columns is best this way, because the rule of thumb is: columns for equality first — then for ranges. More explanation and links in this related answer on dba.SE.

`CLUSTER` / pg_repack

You could further speed things up by streamlining the table according to this index, so that a minimum of blocks have to be read from the table - if you don't have other requirements that stand against it!

If you want it faster, yet, you could streamline the physical order of rows in your table. If you can afford to lock your table exclusively for a few seconds (at off hours for instance) to rewrite your table and order rows according to the index:

ALTER TABLE foo_table CLUSTER ON idx_myindex_idx;

If concurrent use is a problem, consider pg_repack, which can do the same without exclusive lock.

The effect: fewer blocks need to be read from the table and everything is pre-sorted. It's a one-time effect deteriorating over time, if you have writes on the table. So you would rerun it from time to time.

^{I copied and adapted the last chapter from this related answer on dba.SE.}

来源：https://stackoverflow.com/questions/16387090/bitmap-heap-scan-performance

标签

sql