问题
I am trying to build a tiny exercise search engine using mysql.
Each exercise can have an arbitrary number of search tags.
Here is my data structure:
TABLE exercises
ID
title
TABLE searchtags
ID
title
TABLE exerciseSearchtags
exerciseID -> exercises.ID
searchtagID -> searchtags.ID
...where exerciseSearchtags is a many to many join table expressing the relationship between exercises and searchtags.
The search engine accepts an unknown number of user inputted keywords.
I would like to rank search results based on the number of keyword / searchtag matches.
Here is the sql I am currently using to select for exercises. Both the CASE rules and the WHERE rules are dynamically generated, one for each keyword. So for example, if a user enters 3 keywords, there will be 3 CASE rules and 3 WHERE rules.
SELECT
exercises.ID AS ID,
exercises.title AS title,
(
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
...etc...
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)
) AS relevance
FROM
exercises
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.exerciseID = exercises.ID
LEFT JOIN searchtags
ON searchtags.ID = exerciseSearchtags.searchtagID
WHERE
searchtags.title LIKE CONCAT('%',?,'%') OR
searchtags.title LIKE CONCAT('%',?,'%') OR
...etc...
searchtags.title LIKE CONCAT('%',?,'%')
GROUP BY
exercises.ID
ORDER BY
relevance DESC
This almost works. However the results are not being ranked in the order I would expect.
My best guess as to why this is happening, is that the relevence score is being calculated BEFORE the rows are grouped by exercise.ID. So if the left join causes a particular exercise to appear 10 times in the result set, and another exercise to appear 4 times, then the first exercise may get a higher relevence score, even though it may not have more keyword / searchtag matches.
Does anyone have any suggestions / advice on how I can prevent this from happening / fix this?
Thanks (in advance) for your help.
回答1:
I have found a working solution to the above problem, and am posting it here, in case anyone else experiences a similar problem.
The solution is to use a sub-select, instead of a case statement. Here is the above divet of code, corrected. (I do not know if this is the best or most efficient solution, but it has fixed the trouble for me, time being, and seems to return search results reasonably quickly.)
SELECT
exercises.ID AS ID,
exercises.title AS title,
(
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)+
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)+
...etc...
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)
) AS relevance
FROM
exercises
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.exerciseID = exercises.ID
LEFT JOIN searchtags
ON searchtags.ID = exerciseSearchtags.searchtagID
WHERE
searchtags.title LIKE CONCAT('%',?,'%') OR
searchtags.title LIKE CONCAT('%',?,'%') OR
...etc...
searchtags.title LIKE CONCAT('%',?,'%')
GROUP BY
exercises.ID
ORDER BY
relevance DESC
回答2:
Divide and conquer. Instead of trying to do all in one statement, try decomposing the problem into smaller pieces. For instance, first create a temporary table with all the exercises that contain at least one of the search tags. Then make a second pass to rank each exercise in the temp table. Finally select the result ordered by ranking.
回答3:
I have only done something similar for MSSQL not mySQL... so this might not be relevant at all, but its worth a shot :)
I had to put the CASE's as part of the ORDER BY clause to get it to pick it up correctly e.g.:
ORDER BY CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END + CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END + ...etc... CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END DESC
While also leaving them in the SELECT so i could output the relevance on the page (as requested)
Either way, good luck with it!
来源:https://stackoverflow.com/questions/4075026/need-help-with-sql-for-ranking-search-results