Why does the order of join clauses affect the query plan in SQL Server?

最后都变了- 提交于 2019-12-22 03:15:48

问题


I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.

select      sr.WTSASessionRangeID,
            -- bunch of other columns
from        WTSAVW_UserSessionRange us
inner join  WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID

On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.

So since order does seem to matter, is there a join strategy I should be following for writing faster queries?

(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)

Edit: On SQL Server 2000, here are the profiled stats:

  • 946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
  • 263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration

Edit: Here is the logical structure of the tables.

SessionRange ---+--- SessionRangeTutor
                |--- SessionRangeClass
                |--- SessionRangeStream --- MO_Stream
                |--- SessionRangeEnrolmentPeriod
                |--- SessionRangeStudent
                +----SessionSubrange --- SessionSubrangeRoom

Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.

Here's the new query:

select sr.WTSASessionRangeID    // + lots of columns

from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
    WTSA_SessionRangeStream srs
    inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
    WTSA_SessionSubrange ssr    
    left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID

SQLServer2000 cost: 24.9


回答1:


I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:

SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;

SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;

SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
  LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
  LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;

a           b           c
----------- ----------- -----------
1           1           1
2           2           NULL
3           NULL        NULL
4           NULL        NULL

(4 row(s) affected)

a           b           c
----------- ----------- -----------
1           1           1
2           NULL        NULL
3           NULL        3
4           NULL        NULL



回答2:


The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:

<joined_table>

Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.

You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).

This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.




回答3:


Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.

However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).

With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.




回答4:


A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.

Along with the above optimization make sure that any fields used in JOINs are indexed




回答5:


You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.

Lets' take this subset:

WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID

You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.

The join should be this:

WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
    WTSA_SessionSubrange ssr
    left outer join
    WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID

This is why the join order is affecting results because it's a different query, declaratively speaking.

You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.




回答6:


it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first




回答7:


At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).



来源:https://stackoverflow.com/questions/1191497/why-does-the-order-of-join-clauses-affect-the-query-plan-in-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!