SQL Efficiency: WHERE IN Subquery vs. JOIN then GROUP

后端 未结 7 1644
予麋鹿
予麋鹿 2021-02-07 09:35

As an example, I want to get the list of all items with certain tags applied to them. I could do either of the following:

SELECT Item.ID, Item.Name
FROM Item
WH         


        
相关标签:
7条回答
  • 2021-02-07 09:47
    SELECT Item.ID, Item.Name
    FROM Item
    WHERE Item.ID IN (
        SELECT ItemTag.ItemID
        FROM ItemTag
        WHERE ItemTag.TagID = 57 OR ItemTag.TagID = 55)
    

    or

    SELECT Item.ID, Item.Name
    FROM Item
    LEFT JOIN ItemTag ON ItemTag.ItemID = Item.ID
    WHERE ItemTag.TagID = 57 OR ItemTag.TagID = 55
    GROUP BY Item.ID
    

    Your second query won't compile, since it references Item.Name without either grouping or aggregating on it.

    If we remove GROUP BY from the query:

    SELECT  Item.ID, Item.Name
    FROM    Item
    JOIN    ItemTag
    ON      ItemTag.ItemID = Item.ID
    WHERE   ItemTag.TagID = 57 OR ItemTag.TagID = 55
    

    these are still different queries, unless ItemTag.ItemId is a UNIQUE key and marked as such.

    SQL Server is able to detect an IN condition on a UNIQUE column, and will just transform the IN condition into a JOIN.

    If ItemTag.ItemID is not UNIQUE, the first query will use a kind of a SEMI JOIN algorithm, which are quite efficient in SQL Server.

    You can trasform the second query into a JOIN:

    SELECT  Item.ID, Item.Name
    FROM    Item
    JOIN    (
            SELECT DISTINCT ItemID
            FROMT  ItemTag
            WHERE  ItemTag.TagID = 57 OR ItemTag.TagID = 55
            ) tags
    ON      tags.ItemID = Item.ID
    

    but this one is a trifle less efficient than IN or EXISTS.

    See this article in my blog for a more detailed performance comparison:

    • IN vs. JOIN vs. EXISTS
    0 讨论(0)
  • 2021-02-07 09:49

    run this:

    SET SHOWPLAN_ALL ON
    

    then run each version of the query

    you can see if they return the same plan, and if not look at the TotalSubtreeCost on the first row of each and see how different they are.

    0 讨论(0)
  • 2021-02-07 09:52

    The second one is more efficient in MySQL. MySQL will re-execute the query within the IN statement for every WHERE condition test.

    0 讨论(0)
  • 2021-02-07 10:02

    I think it would depend on how the optimizer handles them, it may even be the case that you end up with the same performance. Display execution plan is your friend here.

    0 讨论(0)
  • 2021-02-07 10:09
    SELECT Item.ID, Item.Name
    ...
    GROUP BY Item.ID
    

    This is not valid T-SQL. Item.Name must appear in the group by clause or within an aggregate function, such as SUM or MAX.

    0 讨论(0)
  • 2021-02-07 10:09

    It's pretty much impossible (unless you're one of those crazy guru DBAs) to tell what will be fast and what won't without looking at the execution plan and/or running some stress tests.

    0 讨论(0)
提交回复
热议问题