SQL - Filtering large tables with joins - best practices

前端 未结 3 827
不知归路
不知归路 2020-12-30 11:37

I have a table with a lot of data and I need to join it with some other large tables.

Only a small portion of my table is actually relevant for me each time.

相关标签:
3条回答
  • 2020-12-30 12:09

    Because you are using INNER JOINs the WHERE or JOIN debate only depends on your taste and style. Personally, I like to keep the links between the two tables (e.g. foreign key constraint) in the ON clause, and actual filters against data in the WHERE clause.

    SQL Server will parse the query into the same token tree, and will therefore build identical query execution plans.

    If you were using [LEFT/RIGHT] OUTER JOINS instead, it makes a world of difference since not only is the performance probably different, but also very likely the results.


    To answer your other questions:

    When is it best to filter my data?

    1. In the where clause of the SQL.
    2. Create a temp table with the specific data and only then join it.
    3. Add the predicate to the first inner join ON clause.
    4. Some other idea.

    In the WHERE or ON clause, both are seen as the same. For 3, the "first inner join" has no relevance. In a multi-table INNER JOIN scenario, it really doesn't matter which goes first (in the query), as the query optimizer will shuffle the order as it sees fit.

    Using a temp table is completely unnecessary and won't help, because you are having to extract the relevant portion anyway - which is what a JOIN would do as well. Moreover, if you had a good index on the JOIN conditions/WHERE filter, the index will be used to only visit the relevant data without looking at the rest of the table(s).

    0 讨论(0)
  • 2020-12-30 12:16

    You should put your query in the management studio, tick "include actual execution plan", and run it. That way you will get the exact answer what SQL server did with your query. From then, you can move forward with optimization.

    In general:

    • The columns used for join should be indexed
    • Use the most discriminating filter first
    0 讨论(0)
  • 2020-12-30 12:30

    In a decent cost based query planner what happens is (your case)

    1. join conditions and where conditions are parsed at same level

    2. the type of join and statistics determines the path (what happens first) - in such a way that the smallest intermediate results are retrieved (least I/O > fastest query)

    0 讨论(0)
提交回复
热议问题