问题
is there any way how to optimize next query:
EXPLAIN EXTENDED SELECT keyword_id, ck.keyword, COUNT( article_id ) AS cnt
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id )
WHERE keyword_id
IN (
SELECT keyword_id
FROM career_article_keyword
LEFT JOIN career_keywords ck
USING ( keyword_id )
WHERE article_id
IN (
SELECT article_id
FROM career_article_keyword
WHERE keyword_id =9
)
AND keyword_id <>9
)
GROUP BY keyword_id
ORDER BY cnt DESC
The main task here if I have particular keyword_id (CURRENT_KID) i need to find all keywords which was ever belongs to any article together with CURRENT_KID, and sort result based on quantity of usage these keywords
tables defined as:
mysql> show create table career_article_keyword;
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_article_keyword | CREATE TABLE `career_article_keyword` (
`article_id` int(11) unsigned NOT NULL,
`keyword_id` int(11) NOT NULL,
UNIQUE KEY `article_id` (`article_id`,`keyword_id`),
CONSTRAINT `career_article_keyword_ibfk_1` FOREIGN KEY (`article_id`) REFERENCES `career` (`menu_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show create table career_keywords;
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| career_keywords | CREATE TABLE `career_keywords` (
`keyword_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`keyword` varchar(250) NOT NULL,
PRIMARY KEY (`keyword_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
output of "explain" is scared me
http://o7.no/J6ThIs
on big data this query can kill everything :) can i make it faster somehow ?
thanks.
回答1:
Looking at your EXPLAIN
output, I was concerned that your use of subqueries had resulted in a suboptimal use of indexes. I felt (without any justification - and on this I may very well be wrong) that rewriting using JOIN
might lead to a more optimised query.
To do that, we need to understand what it is your query is intended to do. It would have helped if your question had articulated it, but after a little head-scratching I decided your query was trying to fetch a list of all other keywords that appear in any article that contains some given keyword, together with a count of all articles in which those keywords appear.
Now let's rebuild the query in stages:
Fetch "any article that contains some given keyword" (not worrying about duplicates):
SELECT ca2.article_id FROM career_article_keyword AS ca2 WHERE ca2.keyword_id = 9;
Fetch "all other keywords that appear in [the above]"
SELECT ca1.keyword_id FROM career_article_keyword AS ca1 JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ca1.keyword_id;
Fetch "[the above], together with a count of all articles in which those keywords appear"
SELECT ca1.keyword_id, COUNT(DISTINCT ca0.article_id) AS cnt FROM career_article_keyword AS ca0 JOIN career_article_keyword AS ca1 USING (keyword_id) JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ca1.keyword_id ORDER BY cnt DESC;
Finally, we want to add to the output the matching keyword itself from the
career_keyword
table:SELECT ck.keyword_id, ck.keyword, COUNT(DISTINCT ca0.article_id) AS cnt FROM career_keywords AS ck JOIN career_article_keyword AS ca0 USING (keyword_id) JOIN career_article_keyword AS ca1 USING (keyword_id) JOIN career_article_keyword AS ca2 ON (ca2.article_id = ca1.article_id) WHERE ca1.keyword_id <> 9 AND ca2.keyword_id = 9 GROUP BY ck.keyword_id -- equal to ca1.keyword_id due to join conditions ORDER BY cnt DESC;
One thing that is immediately clear is that your original query referenced career_keywords
twice, whereas this rewritten query references that table only once; this alone might explain the performance difference - try removing the second reference to it (i.e. where it appears in your first subquery), as it's entirely redundant there.
Looking back over this query, we can see that joins are being performed on the following columns:
career_keywords.keyword_id
inck JOIN ca0
This table defines
PRIMARY KEY (`keyword_id`)
, so there is a good index which can be used for this join.career_article_keyword.article_id
inca1 JOIN ca2
This table defines
UNIQUE KEY `article_id` (`article_id`,`keyword_id`)
and, sincearticle_id
is the leftmost column in this index, there is a good index which can be used for this join.career_article_keyword.keyword_id
inck JOIN ca0
andca0 JOIN ca1
There is no index that can be used for this join: the only index defined in this table has another column,
article_id
to the left ofkeyword_id
- so MySQL cannot findkeyword_id
entries in the index without first knowing thearticle_id
. I suggest you create a new index which haskeyword_id
as its leftmost column.(The need for this index could equally have been ascertained directly from looking at your original query, where your two outermost queries perform joins on that column.)
来源:https://stackoverflow.com/questions/10298503/can-it-be-executed-faster-with-big-amount-of-data-mysql