Select all projects that have matching tags

前端 未结 3 1021
粉色の甜心
粉色の甜心 2021-01-01 02:26

I\'m trying to find the most efficient way of dealing with this but I must tell you front-head I\'ve made a mess of it. Looked around SO and found nothing of relevance so he

相关标签:
3条回答
  • 2021-01-01 02:33

    How about... (example for project 1)

    SELECT p.num, p.title
    FROM projects_to_tags pt1, projects_to_tags pt2, projects p
    where pt1.project_id = 1 and 
          pt2.project_id != 1 and 
          pt1.tag_id = pt2.tag_id and 
          p.num = pt2.project_id 
    group by pt2.project_id
    

    And maybe add a separate index for tag_id in projects_to_tags so you can use it alone, instead of the composite. No more type ALL. (Table Scan) Replacing both 1 with 4 give also the desired results.

    0 讨论(0)
  • 2021-01-01 02:40

    In any of the following cases, if you don't know the PROJECT.num/PROJECT_TO_TAGS.project_id, you'll have to join to the PROJECTS table to get the id value for finding out what tags it has associated.

    Using IN

    SELECT p.*
      FROM PROJECTS p
      JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
     WHERE pt.tag_id IN (SELECT x.tag_id
                           FROM PROJECTS_TO_TAGS x
                          WHERE x.project_id = 4)
    

    Using EXISTS

    SELECT p.*
      FROM PROJECTS p
      JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
     WHERE EXISTS (SELECT NULL
                     FROM PROJECTS_TO_TAGS x
                    WHERE x.project_id = 4
                      AND x.tag_id = pt.tag_id)
    

    Using JOINS (this the most efficient one!)

    The DISTINCT is necessary because JOINs risk duplicated data turning up in the resultset...

    SELECT DISTINCT p.*
      FROM PROJECTS p
      JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
      JOIN PROJECTS_TO_TAGS x ON x.tag_id = pt.tag_id
                             AND x.project_id = 4
    
    0 讨论(0)
  • 2021-01-01 02:42

    Something like this... ?

    SELECT *
    FROM projects AS L
    WHERE
       EXISTS (
          SELECT 1
          FROM
             projects_to_tags PT
             INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
          WHERE
             L.num = PT.project_id
             AND PT2.project_id = 4
             AND PT2.project_id <> L.num
       )
    

    That's 2 seeks and a scan.

    UPDATE

    Taking a page from jdelard's book, one tiny modification switches my query to outperform his (of course I'm doing this on SQL Server meaning I took out his GROUP BY and put in a DISTINCT, so YMMV on MySQL):

    SELECT *
    FROM projects AS L
    WHERE
       L.num != 4 -- instead of <> PT2.project_id inside
       AND EXISTS (
          SELECT 1
          FROM
             projects_to_tags PT
             INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
          WHERE
             L.num = PT.project_id
             AND PT2.project_id = 4
       )
    

    The improvement over his query comes from not doing a DISTINCT or aggregate, and using a semi join instead of a complete join so not every row has to be joined. Otherwise, semantically they are largely the same.

    I will have to remember jdelard's trick as it is a very useful tool. For some reason the query engine was not smart enough to compute that given {a = 4, a != b} then {b != 4}.

    0 讨论(0)
提交回复
热议问题