问题
My website contains pieces of content on which users can vote (like/dislike similar to reddit upvotes). When selecting an individual piece of content, I run the following subqueries to get the number of likes, the number of dislikes and the current user's vote.
The votes are stored in a separate table {contentId, userId, vote}
SELECT
[... BUNCH OF FIELDS ...]
(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::LIKE.') AS likes,
(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::DISLIKE.') AS dislikes,
COALESCE((SELECT vote FROM votes vt WHERE vt.cId = c.contentId AND userId = '.USER_ID.'), '.Constants::NO_VOTE.') AS myVote
FROM content
[... OTHER STUFF ... ]
Is there a better way to achieve this (combine those subqueries or otherwise)?
回答1:
In terms of performance, those correlated subqueries can eat your lunch. And devour your lunchbox too, for large sets, because of the way MySQL processes them. Each of those subqueries gets executed for every row returned in the outer query. And that can get very expensive for large sets.
An alternative approach is to use an inline view to materialize the likes and dislikes for all content, and then do a join operation to that.
But, this approach can be expensive too, particularly when you are only needing the vote "counts" for just a few content rows, out of a bazillion rows. Often, there is a predicate from the outer query that can also be incorporated into the inline view, to limit the number of rows that need to be examined and returned.
We want to use an OUTER join to that inline view, so it returns a result equivalent to your query; returning a row from content
when there are no matching rows in the vote
table.
SELECT [... BUNCH OF FIELDS ...]
, COALESCE(v.likes,0) AS likes
, COALESCE(v.dislikes,0) AS dislikes
, COALESCE(v.myvote,'.Constants::NO_VOTE.') AS myvote
FROM content c
LEFT
JOIN ( SELECT vt.cId
, SUM(vt.vote = '.Constants::LIKE.') AS likes
, SUM(vt.vote = '.Constants::DISLIKE.') AS dislikes
, MAX(IF(vt.userId = '.USER_ID.',vt.vote,NULL)) AS myvote
FROM votes vt
GROUP
BY vt.cId
) v
ON v.cId = c.contentId
[... OTHER STUFF ... ]
Note that the inline view query (aliased as v
) is going to look at EVERY single row from the votes
table. If you only need a subset, then consider adding an appropriate predicate (either in a WHERE clause or as a JOIN to another table). There's no indication from the [... OTHER STUFF ...]
in your query whether it's returning just a few rows from content
or if you are needing all of the rows because you are ordering by likes
, etc.
For a small number of rows selected from the content
table, using the correlated subqueries (like in your query) can actually be faster than materializing a huge inline view and performing a join operation against it.
Oh... and for both queries, it goes without saying that an appropriate index on the votes
table with a leading column of cId
will benefit performance. For the inline view, you don't want the overhead of MySQL having to perform a filesort
operation on all of those rows to do the GROUP BY. And for the correlated subqueries, you want them to use a index range scan, not a full scan.
回答2:
Your problem is simple your current sub queries are running for every single row returned.
You need to join to that data instead. You will need to change the code I've added to give you the correct counts but this should point you in the right direction.
SELECT
BLAH
Likes,
Dislikes
FROM CONTENT as C
INNER JOIN (
SELECT
cID,
COUNT(votes) as Likes, --you will need to alter this
COUNT(votes) as Dislikes --to count your up and downvotes
FROM Votes
GROUP BY cID
) AS V
ON V.cID = C.ContentID
来源:https://stackoverflow.com/questions/22054530/optimizing-mysql-query-likes-dislikes