Select query with three where conditions is slow, but the same query with any combination of two of the three where conditions is fast

前端 未结 3 1318
暖寄归人
暖寄归人 2021-01-03 08:33

I have the following query:

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_1 = 0
AND table_1.col         


        
相关标签:
3条回答
  • 2021-01-03 09:26

    Problems like this tend to require trying things and testing to see how well they work.

    As such, start with this:

    SELECT
    table_1.id
    FROM
    table_1
    LEFT JOIN table_2
    ON table_1.id = table_2.id
    AND table_1.date_col <= table_2.date_col
    WHERE
    table_1.col_condition_1 = 0
    AND table_1.col_condition_2 NOT IN (3, 4)
    AND table_2.id is NULL
    
    LIMIT 5000;
    

    Logical reasoning on why this is equivalent to your query: Your original query's WHERE statement of (table_2.id is NULL OR table_1.date_col > table_2.date_col) can be summarized as "Only include table_1 records that either do NOT have a table_2 record, or where the table_2 record is earlier than (or equal to) the table_1 record.

    My version of the query uses an anti-join to exclude all table_1 records where they exists a table_2 that is earlier than (or equal to) the table_1 record.

    Indexes

    There are a number of possible composite indexes that may help this query. Here are a couple to start with:

    For table_2: (id,date_col)

    For table_1: (col_condition_1,id,date_col,col_condition_2)

    Please try my query and indexes, and report the results (including EXPLAIN plan).

    0 讨论(0)
  • 2021-01-03 09:31

    Try to split the existing SQL in two parts and see what are the execution times for each. This would hopefully give you what part is responsible for the slowness:

    part 1:

    SELECT table_1.id
      FROM table_1
      LEFT JOIN table_2
        ON (table_1.id = table_2.id)
     WHERE table_1.col_condition_1 = 0
       AND table_1.col_condition_2 NOT IN (3, 4)
       AND table_2.id is NULL
    

    and part 2 (note the inner join here):

    SELECT table_1.id
      FROM table_1
      JOIN table_2
        ON (table_1.id = table_2.id)
     WHERE table_1.col_condition_1 = 0
       AND table_1.col_condition_2 NOT IN (3, 4)
       AND table_1.date_col > table_2.date_col
    

    I expect the part 2 would be the one to take longer. In this I think an index on both table_1 and table_2 on date_coll would help.

    I don't think the composite index would help at all in your select.

    This said it is hard to diagnose why the three conditions together would impact the performance that badly. It seems to be related to your data distribution. Not sure about mySql but in Oracle a statistics collections on those tables would make a difference.

    Hope it helps.

    0 讨论(0)
  • 2021-01-03 09:32
    • OR is a performance killer.
    • Sometimes using UNION instead of OR can speed up the query.
    • Perhaps in one case the 5000 were "near the beginning" of the combined tables, but not in the other case.
    • Using LIMIT without ORDER BY is dubious.
    • Since a PK is a Unique key, it is redundant to also declare id_UNIQUE.
    • INDEX(a) is unnecessary when you also have INDEX(a,b).
    • If there are only 4 values, IN (1, 2) might be faster than NOT IN (3, 4).
    • It is unusual to have two tables sharing the same PK. Why do you have a 1:1 relationship?
    • We might have further insight if we could see the real column names.
    0 讨论(0)
提交回复
热议问题