JOINS, EXISTS or IN which is better? Few questions on SQL

前端 未结 8 1787
不思量自难忘°
不思量自难忘° 2021-01-14 07:58

I have few questions on SQL..

  1. How to analyze the performance of a query? Any software, inbuilt features of MSSQL server 2005/2008?

  2. What shou

相关标签:
8条回答
  • 2021-01-14 08:37
    1. As others have said, check the "execution plan". SQL Server Management studio can show you two kinds of execution plans, estimated and actual. Estimated is how SQL Server guesses it would execute the query and is returned without actually executing the query, and the actual plan is returned together with a result set and shows what was actually done.

    2. That query looks good, but you have to make sure that you have an index on enquiry_courses.enquiry_id, and it's probably best that enquiries.enquiry_id is not nullable.

    3. The semantics of IN and EXISTS are slightly different (IN will return no rows if there is one or more NULLs in the subquery). If the subquery is guaranteed to be not null, it doesn't matter. There is some kind of "internet truth" that you should use EXISTS on SQL Server and IN on Oracle, but this might have been true when dinosaurs ruled the planet but it doesn't apply anymore. IN and EXISTS both perform a semi-join, and the optimizer is more than capable of deciding how to execute this join.

    0 讨论(0)
  • 2021-01-14 08:42

    They each behave differently: it is not a performance choice

    The only correct and reliable choice is EXISTS or NOT EXISTS that works all the time.

    • JOIN may needs DISTINCT
    • WHERE/LEFT JOIN would needs correct placement of the filter
    • NOT IN fails on NULL

    Example:

    DECLARE @Parent TABLE (foo int NULL)
    INSERT @Parent (foo) VALUES (1)
    INSERT @Parent (foo) VALUES (2)
    INSERT @Parent (foo) VALUES (3)
    INSERT @Parent (foo) VALUES (4)
    
    DECLARE @Child TABLE (bar int NULL, foo int NULL)
    INSERT @Child (bar, foo) VALUES (100, 1)
    INSERT @Child (bar, foo) VALUES (200, 2)
    INSERT @Child (bar, foo) VALUES (201, 2)
    INSERT @Child (bar, foo) VALUES (300, NULL)
    INSERT @Child (bar, foo) VALUES (301, NULL)
    INSERT @Child (bar, foo) VALUES (400, 4)
    INSERT @Child (bar, foo) VALUES (500, NULL)
    
    --"positive" checks
    SELECT -- multiple "2" = FAIL without DISTINCT
        P.*
    FROM
        @Parent P JOIN @Child C ON P.foo = C.foo
    
    SELECT -- correct
        P.*
    FROM
        @Parent P
    WHERE
        P.foo IN (SELECT c.foo FROM @Child C)
    
    SELECT -- correct
        P.*
    FROM
        @Parent P
    WHERE
        EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
    
    --"negative" checks
    SELECT -- correct
        P.*
    FROM
        @Parent P LEFT JOIN @Child C ON P.foo = C.foo
    WHERE
        C.foo IS NULL
    
    SELECT -- no rows = FAIL
        P.*
    FROM
        @Parent P
    WHERE
        P.foo NOT IN (SELECT c.foo FROM @Child C)
    
    SELECT -- correct
        P.*
    FROM
        @Parent P
    WHERE
        NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
    

    Note: with EXISTS, the SELECT in the subquery is irrelevant as mentioned in ANSI 92 standard...

    NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
    NOT EXISTS (SELECT NULL FROM @Child C WHERE P.foo = C.foo)
    NOT EXISTS (SELECT 1 FROM @Child C WHERE P.foo = C.foo)
    NOT EXISTS (SELECT 1/0 FROM @Child C WHERE P.foo = C.foo)
    
    0 讨论(0)
  • 2021-01-14 08:46

    3: I would expect an IN or EXIST clause to be flattened to a JOIN by the database engine, so there shouldn't be a difference in performance. I don't know about SQL Server, but in Oracle you can verify this by checking the execution plan.

    0 讨论(0)
  • 2021-01-14 08:48
    1. check the Excution Plan
    2. You can optimise your query by:
      • Make a "arguments search" rather than IN
      • Put Index on session_id
        SELECT * FROM enquiry_courses as Courses, enquiries as Enquiries
        WHERE Enquiries.session_id = '4cd3420a16dbd61c6af58f6199ac00f1'   
        AND Courses.enquiry_id = Enquiries.enquiry_id
    

    3.Exists is better for performance.

    EDIT: Exists & IN are better than JOIN for performance issues.

    EDIT: I re-wrote the query so that it's faster (I put the most restrictive condition first in the WHERE close)

    0 讨论(0)
  • 2021-01-14 08:50

    This question suggests that EXISTS is quicker which is what I had been taught IN () vs EXISTS () in SqlServer 2005 (or generally in any RDBMS)

    One thing to note is that EXISTS and IN should be used in preference to NOT EXISTS and NOT IN

    Bit of a tangent from performance but this is a good article on the subtle differences between IN and EXISTS http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

    0 讨论(0)
  • 2021-01-14 08:56
    1. Use the SQL Server Management Studio, and include Actual Execution Plan and SET STATISTICS TIME and SET STATISTICS IO.

    2. This in corresponds to a JOIN, but rewriting probably won't matter. A guess could be that you need indexes on enquiry_courses.enquiry_id and on enquiries.session_id to improve query performance.

    0 讨论(0)
提交回复
热议问题