问题
Let's say I have a site like Stackoverflow with posts that people can reply to, and I would like to have an interface for sorting the posts by reply count
This will eventually have infinite scroll pagination, so showing 10 results at a time.
Here's an example query for that:
SELECT *, (SELECT COUNT(*)
FROM post_reply pr
WHERE pr.convo_id = post.convo_id) as replies
FROM post
ORDER BY replies
LIMIT 10;
This works, but it is prohibitively slow. I have hundreds of thousands of posts and this causes the query to take >30s to complete.
An index would improve the speed, but I have no idea how to implement an index on a subquery.
A materialized view could also work, but updating the materialized view every time someone replies to a post seems prohibitively slow as well.
Is there a good solution to this problem?
回答1:
You cannot really speed up this query.
You can change the data model and use a lot of infrastructure to get a faster sort. The idea is:
- Add a column
post_reply_count
to theposts
table. - Add an index on this column.
- Keep this column up-to-date using triggers --
+ 1
forinsert
,- 1
fordelete
. And whatever is appropriate forupdate
. - Use this column in your query.
This adds overhead. But if you really need a speed response to this query, you may not have a choice.
回答2:
You could change order of queries and first generate list of posts by reply count and then get post
columns. This should use primary key (I'm assuming post.convo_id
is one) and potentially be faster, tho I do not guarantee it will.
SELECT post.*, sub.replies
FROM (SELECT pr.convo_id, COUNT(*) AS replies
FROM post_reply pr
GROUP BY pr.convo_id
ORDER BY replies --maybe DESC if you want top reply count first
LIMIT 10
) AS sub
JOIN post USING(convo_id);
来源:https://stackoverflow.com/questions/58523659/how-to-efficiently-sort-by-the-results-of-a-subquery