I\'m trying to find the most efficient way of dealing with this but I must tell you front-head I\'ve made a mess of it. Looked around SO and found nothing of relevance so he
How about... (example for project 1)
SELECT p.num, p.title
FROM projects_to_tags pt1, projects_to_tags pt2, projects p
where pt1.project_id = 1 and
pt2.project_id != 1 and
pt1.tag_id = pt2.tag_id and
p.num = pt2.project_id
group by pt2.project_id
And maybe add a separate index for tag_id in projects_to_tags so you can use it alone, instead of the composite. No more type ALL. (Table Scan) Replacing both 1 with 4 give also the desired results.
In any of the following cases, if you don't know the PROJECT.num
/PROJECT_TO_TAGS.project_id
, you'll have to join to the PROJECTS
table to get the id value for finding out what tags it has associated.
SELECT p.*
FROM PROJECTS p
JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
WHERE pt.tag_id IN (SELECT x.tag_id
FROM PROJECTS_TO_TAGS x
WHERE x.project_id = 4)
SELECT p.*
FROM PROJECTS p
JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
WHERE EXISTS (SELECT NULL
FROM PROJECTS_TO_TAGS x
WHERE x.project_id = 4
AND x.tag_id = pt.tag_id)
The DISTINCT
is necessary because JOINs risk duplicated data turning up in the resultset...
SELECT DISTINCT p.*
FROM PROJECTS p
JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
JOIN PROJECTS_TO_TAGS x ON x.tag_id = pt.tag_id
AND x.project_id = 4
Something like this... ?
SELECT *
FROM projects AS L
WHERE
EXISTS (
SELECT 1
FROM
projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE
L.num = PT.project_id
AND PT2.project_id = 4
AND PT2.project_id <> L.num
)
That's 2 seeks and a scan.
UPDATE
Taking a page from jdelard's book, one tiny modification switches my query to outperform his (of course I'm doing this on SQL Server meaning I took out his GROUP BY and put in a DISTINCT, so YMMV on MySQL):
SELECT *
FROM projects AS L
WHERE
L.num != 4 -- instead of <> PT2.project_id inside
AND EXISTS (
SELECT 1
FROM
projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE
L.num = PT.project_id
AND PT2.project_id = 4
)
The improvement over his query comes from not doing a DISTINCT or aggregate, and using a semi join instead of a complete join so not every row has to be joined. Otherwise, semantically they are largely the same.
I will have to remember jdelard's trick as it is a very useful tool. For some reason the query engine was not smart enough to compute that given {a = 4, a != b} then {b != 4}.