How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?

守給你的承諾、 提交于 2019-12-28 04:29:12

问题


I have a table bsort:

CREATE TABLE bsort(a int, data text);

Here data may be incomplete. In other words, some tuples may not have data value.

And then I build a b-tree index on the table:

CREATE INDEX ON bsort USING BTREE(a);

Now if I perform this query:

SELECT * FROM bsort ORDER BY a;

Does PostgreSQL sort tuples with nlogn complexity, or does it get the order directly from the b-tree index?


回答1:


For a simple query like this Postgres will use an index scan and retrieve readily sorted tuples from the index in order. Due to its MVCC model Postgres had to always visit the "heap" (data pages) additionally to verify entries are actually visible to the current transaction. Quoting the Postgres Wiki on index-only scans:

PostgreSQL indexes do not contain visibility information. That is, it is not directly possible to ascertain if any given tuple is visible to the current transaction, which is why it has taken so long for index-only scans to be implemented.

Which finally happened in version 9.2: index-only scans. The manual:

If the index stores the original indexed data values (and not some lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data not just the TID of the heap tuple. This will only avoid I/O if the visibility map shows that the TID is on an all-visible page; else the heap tuple must be visited anyway to check MVCC visibility. But that is no concern of the access method's.

So now it depends on the visibility map of the table, whether index-only scans are possible. Only an option if all involved columns are included in the index. Else, the heap has to be visited (additionally) in any case. The sort step is still not needed.

That's why we sometimes append otherwise useless columns to indexes now. Like the data column in your example:

CREATE INDEX ON bsort USING BTREE(a, data);

It makes the index bigger (depends) and a bit more expensive to maintain and use for other purposes that do not allow an index-only scan. So only append the data column if you get index-only scans out of it. The order of columns in the index is important:

  • Working of indexes in PostgreSQL
  • Is a composite index also good for queries on the first field?

The benefit of an index-only scan, per documentation:

If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most noticeable on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.

The visibility map is maintained by VACUUM which happens automatically if you have autovacuum running (the default setting in modern Postgres). Details:

  • Are regular VACUUM ANALYZE still recommended under 9.1?

But there is some delay between write operations to the table and the next VACUUM run. The gist of it:

  • Read-only tables stay ready for index-only scans once vacuumed.
  • Data pages that have been modified lose their "all-visible" flag in the visibility map until the next VACUUM (and all older tansactions being finished), so it depends on the ratio between write operations and VACUUM frequency.

Partial index-only scans are still possible if some of the involved pages are marked all-visible. But if the heap has to be visited anyway, the access method "index scan" is a bit cheaper. So if too many pages are currently dirty, Postgres will switch to the cheaper index scan altogether. The Postgres Wiki again:

As the number of heap fetches (or "visits") that are projected to be needed by the planner goes up, the planner will eventually conclude that an index-only scan isn't desirable, as it isn't the cheapest possible plan according to its cost model. The value of index-only scans lies wholly in their potential to allow us to elide heap access (if only partially) and minimise I/O.




回答2:


You would need to check the execution plan. But, Postgres is quite capable of using the index to make the order by more efficient. It would read the records directly from the index. Because you have only one column, there is no need to access the data pages.



来源:https://stackoverflow.com/questions/31576669/how-does-postgresql-perform-order-by-if-a-b-tree-index-is-built-on-that-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!