INNER JOIN vs LEFT JOIN performance in SQL Server

后端 未结 9 1777
名媛妹妹
名媛妹妹 2020-11-22 11:51

I\'ve created SQL command that uses INNER JOIN on 9 tables, anyway this command takes a very long time (more than five minutes). So my folk suggested me to change INNER JOIN

相关标签:
9条回答
  • 2020-11-22 12:26

    Have done a number of comparisons between left outer and inner joins and have not been able to find a consisten difference. There are many variables. Am working on a reporting database with thousands of tables many with a large number of fields, many changes over time (vendor versions and local workflow) . It is not possible to create all of the combinations of covering indexes to meet the needs of such a wide variety of queries and handle historical data. Have seen inner queries kill server performance because two large (millions to tens of millions of rows) tables are inner joined both pulling a large number of fields and no covering index exists.

    The biggest issue though, doesn't seem to appeaer in the discussions above. Maybe your database is well designed with triggers and well designed transaction processing to ensure good data. Mine frequently has NULL values where they aren't expected. Yes the table definitions could enforce no-Nulls but that isn't an option in my environment.

    So the question is... do you design your query only for speed, a higher priority for transaction processing that runs the same code thousands of times a minute. Or do you go for accuracy that a left outer join will provide. Remember that inner joins must find matches on both sides, so an unexpected NULL will not only remove data from the two tables but possibly entire rows of information. And it happens so nicely, no error messages.

    You can be very fast as getting 90% of the needed data and not discover the inner joins have silently removed information. Sometimes inner joins can be faster, but I don't believe anyone making that assumption unless they have reviewed the execution plan. Speed is important, but accuracy is more important.

    0 讨论(0)
  • 2020-11-22 12:28

    If everything works as it should it shouldn't, BUT we all know everything doesn't work the way it should especially when it comes to the query optimizer, query plan caching and statistics.

    First I would suggest rebuilding index and statistics, then clearing the query plan cache just to make sure that's not screwing things up. However I've experienced problems even when that's done.

    I've experienced some cases where a left join has been faster than a inner join.

    The underlying reason is this: If you have two tables and you join on a column with an index (on both tables). The inner join will produce the same result no matter if you loop over the entries in the index on table one and match with index on table two as if you would do the reverse: Loop over entries in the index on table two and match with index in table one. The problem is when you have misleading statistics, the query optimizer will use the statistics of the index to find the table with least matching entries (based on your other criteria). If you have two tables with 1 million in each, in table one you have 10 rows matching and in table two you have 100000 rows matching. The best way would be to do an index scan on table one and matching 10 times in table two. The reverse would be an index scan that loops over 100000 rows and tries to match 100000 times and only 10 succeed. So if the statistics isn't correct the optimizer might choose the wrong table and index to loop over.

    If the optimizer chooses to optimize the left join in the order it is written it will perform better than the inner join.

    BUT, the optimizer may also optimize a left join sub-optimally as a left semi join. To make it choose the one you want you can use the force order hint.

    0 讨论(0)
  • 2020-11-22 12:38

    From my comparisons, I find that they have the exact same execution plan. There're three scenarios:

    1. If and when they return the same results, they have the same speed. However, we must keep in mind that they are not the same queries, and that LEFT JOIN will possibly return more results (when some ON conditions aren't met) --- this is why it's usually slower.

    2. When the main table (first non-const one in the execution plan) has a restrictive condition (WHERE id = ?) and the corresponding ON condition is on a NULL value, the "right" table is not joined --- this is when LEFT JOIN is faster.

    3. As discussed in Point 1, usually INNER JOIN is more restrictive and returns fewer results and is therefore faster.

    Both use (the same) indices.

    0 讨论(0)
提交回复
热议问题