SQL query for index/primary key ordinal

孤者浪人 提交于 2019-11-27 04:51:30

问题


In our on-line contest system, there is a frequently changing table standings with integer columns (user_id, score). Both are indexed with a unique constraint. Two kinds of queries are required:

  1. Given a score not in the table, return the 1-based position that the score would occupy if it were inserted.
  2. Given a user_id in the table, return the position of the corresponding score.

In both cases, position is with respect to score ascending: a new score smaller than all currently in the table will have position 1.

Here's the tough part: we probably can't afford a table scan. The table may have up to 10 million records, and we need to handle at least 40 queries per second.

How to do this in PostgreSQL?

I have a non-SQL solution in Berkeley DB that uses its logical record number-enabled B-trees. It easily has good enough performance. But we would like to get rid of the BDB by re-implementing with a PostgreSQL query. I have tried the obvious

select 1+count(*) from standings where score < ? limit 1;

This causes a table scan.

I expect the answer to be "no way" because the logical record number facility of BDB requires locking the entire B-Tree for each edit. To get O(log N) performance, it relies on leaf counts in each node. All these counts in the path to root must change with every edit; hence, the locking. Such locking is against the design principles of PostgreSQL and probably any multi-user database.

So if the problem can't be solved with PostgreSQL, confirmation of this is the next best result of this question.


回答1:


With a regular table, there is not much you can do in PostgreSQL 9.1. count() results in a table scan, because indexes do not have visibility information. To verify the rows are not deleted in the meantime, PostgreSQL has to visit the table.

If the table is read-only (or rarely updated), you could add a row number to the table. Then a query like:

SELECT rownumber+1
FROM   standings
WHERE  score < ?
ORDER  BY score DESC
LIMIT  1;

With an index:

CREATE INDEX standings_score_idx ON standings (score DESC);

Would get the result almost instantly. However, that's not an option for a table with write load for obvious reasons. So not for you.


The good news: one of the major new features of the upcoming PostgreSQL 9.2 is just right for you: "Covering index" or "index-only scan". I quote the 9.2 release notes here:

Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)

This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visible tuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementing this feature.

This blog post by Robert Haas has more details how this affects count performance. It helps performance even with a WHERE clause, like in your case.



来源:https://stackoverflow.com/questions/11623928/sql-query-for-index-primary-key-ordinal

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!