Efficient PostgreSQL query on timestamp using index or bitmap index scan?

陌路散爱 提交于 2019-12-06 07:32:09

The 1st query expects to find rows=74, but actually finds rows=40250.
The 2nd query expects to find rows=897368 and actually finds rows=924699.

Of course, processing 23 x as many rows takes considerably more time. So your actual times are not surprising.

Statistics for data with updated_at > now() are outdated. Run:

ANALYZE tickets;

and repeat your queries. And you seriously have data with updated_at > now()? That sounds wrong.

It's not surprising, however, that statistics are outdated for data most recently changed. That's in the logic of things. If your query depends on current statistics, you have to run ANALYZE before you run your query.

Also test with (in your session only):

SET enable_bitmapscan = off;

and repeat your second query to see times without bitmap index scan.

Why bitmap index scan for more rows?

A plain index scan fetches rows from the heap sequentially as found in the index. That's simple, dumb and without overhead. Fast for few rows, but may end up more expensive than a bitmap index scan with a growing number of rows.

A bitmap index scan collects rows from the index before looking up the table. If multiple rows reside on the same data page, that saves repeated visits and can make things considerably faster. The more rows, the greater the chance, a bitmap index scan will save time.

For even more rows (around 5% of the table, heavily depends on actual data), the planner switches to a sequential scan of the table and doesn't use the index at all.

The optimum would be an index only scan, introduced with Postgres 9.2. That's only possible if some preconditions are met. If all relevant columns are included in the index, the index type support it and the visibility map indicates that all rows on a data page are visible to all transactions, that page doesn't have to be fetched from the heap (the table) and the information in the index is enough.

The decision depends on your statistics (how many rows Postgres expects to find and their distribution) and on cost settings, most importantly random_page_cost, cpu_index_tuple_cost and effective_cache_size.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!