问题
I have a table bsort
:
CREATE TABLE bsort(a int, data text);
Here data
may be incomplete. In other words, some tuples may not have data
value.
And then I build a b-tree index on the table:
CREATE INDEX ON bsort USING BTREE(a);
Now if I perform this query:
SELECT * FROM bsort ORDER BY a;
Does PostgreSQL sort tuples with nlogn complexity, or does it get the order directly from the b-tree index?
回答1:
For a simple query like this Postgres will use an index scan and retrieve readily sorted tuples from the index in order. Due to its MVCC model Postgres had to always visit the "heap" (data pages) additionally to verify entries are actually visible to the current transaction. Quoting the Postgres Wiki on index-only scans:
PostgreSQL indexes do not contain visibility information. That is, it is not directly possible to ascertain if any given tuple is visible to the current transaction, which is why it has taken so long for index-only scans to be implemented.
Which finally happened in version 9.2: index-only scans. The manual:
If the index stores the original indexed data values (and not some lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data not just the
TID
of the heap tuple. This will only avoid I/O if the visibility map shows that theTID
is on an all-visible page; else the heap tuple must be visited anyway to check MVCC visibility. But that is no concern of the access method's.
So now it depends on the visibility map of the table, whether index-only scans are possible. Only an option if all involved columns are included in the index. Else, the heap has to be visited (additionally) in any case. The sort step is still not needed.
That's why we sometimes append otherwise useless columns to indexes now. Like the data
column in your example:
CREATE INDEX ON bsort USING BTREE(a, data);
It makes the index bigger (depends) and a bit more expensive to maintain and use for other purposes that do not allow an index-only scan. So only append the data
column if you get index-only scans out of it. The order of columns in the index is important:
- Working of indexes in PostgreSQL
- Is a composite index also good for queries on the first field?
The benefit of an index-only scan, per documentation:
If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most noticeable on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.
The visibility map is maintained by VACUUM which happens automatically if you have autovacuum running (the default setting in modern Postgres). Details:
- Are regular VACUUM ANALYZE still recommended under 9.1?
But there is some delay between write operations to the table and the next VACUUM
run. The gist of it:
- Read-only tables stay ready for index-only scans once vacuumed.
- Data pages that have been modified lose their "all-visible" flag in the visibility map until the next
VACUUM
(and all older tansactions being finished), so it depends on the ratio between write operations andVACUUM
frequency.
Partial index-only scans are still possible if some of the involved pages are marked all-visible. But if the heap has to be visited anyway, the access method "index scan" is a bit cheaper. So if too many pages are currently dirty, Postgres will switch to the cheaper index scan altogether. The Postgres Wiki again:
As the number of heap fetches (or "visits") that are projected to be needed by the planner goes up, the planner will eventually conclude that an index-only scan isn't desirable, as it isn't the cheapest possible plan according to its cost model. The value of index-only scans lies wholly in their potential to allow us to elide heap access (if only partially) and minimise I/O.
回答2:
You would need to check the execution plan. But, Postgres is quite capable of using the index to make the order by
more efficient. It would read the records directly from the index. Because you have only one column, there is no need to access the data pages.
来源:https://stackoverflow.com/questions/31576669/how-does-postgresql-perform-order-by-if-a-b-tree-index-is-built-on-that-field