In our on-line contest system, there is a frequently changing table standings
with integer columns (user_id, score)
. Both are indexed with a unique constraint. Two kinds of queries are required:
- Given a
score
not in the table, return the 1-based position that the score would occupy if it were inserted. - Given a
user_id
in the table, return the position of the corresponding score.
In both cases, position is with respect to score ascending: a new score smaller than all currently in the table will have position 1.
Here's the tough part: we probably can't afford a table scan. The table may have up to 10 million records, and we need to handle at least 40 queries per second.
How to do this in PostgreSQL?
I have a non-SQL solution in Berkeley DB that uses its logical record number-enabled B-trees. It easily has good enough performance. But we would like to get rid of the BDB by re-implementing with a PostgreSQL query. I have tried the obvious
select 1+count(*) from standings where score < ? limit 1;
This causes a table scan.
I expect the answer to be "no way" because the logical record number facility of BDB requires locking the entire B-Tree for each edit. To get O(log N) performance, it relies on leaf counts in each node. All these counts in the path to root must change with every edit; hence, the locking. Such locking is against the design principles of PostgreSQL and probably any multi-user database.
So if the problem can't be solved with PostgreSQL, confirmation of this is the next best result of this question.
With a regular table, there is not much you can do in PostgreSQL 9.1. count()
results in a table scan, because indexes do not have visibility information. To verify the rows are not deleted in the meantime, PostgreSQL has to visit the table.
If the table is read-only (or rarely updated), you could add a row number to the table. Then a query like:
SELECT rownumber+1
FROM standings
WHERE score < ?
ORDER BY score DESC
LIMIT 1;
With an index:
CREATE INDEX standings_score_idx ON standings (score DESC);
Would get the result almost instantly. However, that's not an option for a table with write load for obvious reasons. So not for you.
The good news: one of the major new features of the upcoming PostgreSQL 9.2 is just right for you: "Covering index" or "index-only scan". I quote the 9.2 release notes here:
Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)
This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visible tuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementing this feature.
This blog post by Robert Haas has more details how this affects count performance. It helps performance even with a WHERE
clause, like in your case.
来源:https://stackoverflow.com/questions/11623928/sql-query-for-index-primary-key-ordinal