NOT IN selection with NULL values

后端 未结 2 478
一整个雨季
一整个雨季 2020-12-20 07:44

Who can explain the result of the second select. I expected the first row of data1 (1, \'1\') is not in data2?

with data1(id, val) as
 (select 1, \'1\' from          


        
相关标签:
2条回答
  • 2020-12-20 07:53

    When dealing with null values, you should better use NOT EXISTS than NOT IN. Here is a post from asktom explaining why:

    https://asktom.oracle.com/pls/asktom/f?p=100:11:0::no::p11_question_id:442029737684

    0 讨论(0)
  • 2020-12-20 08:04

    First a bit of theory: Null (SQL)

    The most important parts for us from the above link:

    Comparisons with NULL and the three-valued logic (3VL)

    Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.[8] The logical result of the expression below, which compares the value 10 to Null, is Unknown:

    SELECT 10 = NULL       -- Results in Unknown
    

    so that both comparisons: x = NULL and x <> NULL evaluates to NULL(unknown).


    SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic (3VL). The rules governing SQL three-valued logic are shown in the tables below (p and q represent logical states)"[9] The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene and Łukasiewicz three-valued logic (which differ in their definition of implication, however SQL defines no such operation).

    +---------+-------------+-------------+-------------+-----------+--------+
    |    p    |        q    |     p OR q  |     p AND q |    p = q  |p != q  |
    +---------+-------------+-------------+-------------+-----------+--------+
    | True    |     True    |     True    |     True    |   True    | False  |
    | True    |     False   |     True    |     False   |   False   | True   |
    | True    |     Unknown |     True    |     Unknown |   Unknown | Unknown|
    | False   |     True    |     True    |     False   |   False   | True   |
    | False   |     False   |     False   |     False   |   True    | False  |
    | False   |     Unknown |     Unknown |     False   |   Unknown | Unknown|
    | Unknown |     True    |     True    |     Unknown |   Unknown | Unknown|
    | Unknown |     False   |     Unknown |     False   |   Unknown | Unknown|
    | Unknown |     Unknown |     Unknown |     Unknown |   Unknown | Unknown|
    +---------+-------------+-------------+-------------+-----------+--------+
    

    Effect of Unknown in WHERE clauses

    SQL three-valued logic is encountered in Data Manipulation Language (DML) in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True.

    So in short: WHERE clause treats NULL as FALSE




    Now please consider a simpler case:

    SELECT * FROM T1;
    
    |      X |
    |--------|
    |      1 |
    | (null) |
    

    and a query:

    SELECT * FROM t1 WHERE x IN (1, NULL);
    

    The above query is a shortland to this one:

    SELECT * FROM t1 
    WHERE x = 1
      OR  x = NULL
    

    For the second row from table t ( x = NULL) this condition looks like:

    WHERE NULL = 1
       OR NULL = NULL
    

    so this condition for the row x=NULL evaluates to NULL because NULL=1 is NULL, NULL=NULL is NULL, and NULL OR NULL is also NULL (please see the table 3VL above).



    Now consider more curious case:

    SELECT * FROM t1 WHERE x NOT IN (1, NULL);
    

    This clause x NOT IN (1, NULL) is equivalent to NOT ( x IN (1, NULL) )
    so it is also equivalent to:

    NOT (
      x = 1
      OR
      x = NULL
    )
    

    and according to De Morgan's laws it is equivalent to:

    NOT ( x = 1 ) AND NOT ( x = NULL )
    

    and (if we replace NOT x = y with x <> y) it's also equivalent to:

     x <> 1 AND x <> NULL
    

    Please look closely at the last condition:

    WHERE 
    x <> 1 AND x <> NULL
    

    We know than x <> NULL always evaluates to NULL. We also know from the 3VL table above, that both true AND NULL is NULL and false AND NULL evaluates to FALSE, so the whole condition always evaluates either to FALSE or NULL, but it never evaluates to TRUE.

    Therefore a query with this condition:

    SELECT .....
    WHERE x NOT IN ( NULL, whatever)
    

    always returns empty resultset




    And now your query, which is also curious:

    SELECT * FROM t1
    WHERE (id, val) NOT IN (select id, val from data2);
    

    which can be rewriten (using constant values) to:

    SELECT * FROM t1
    WHERE (id, val) NOT IN (
           (1, null),
           (2, 2 )
    )
    

    This query is using so called row value expression

    Basically a condition using the row value expressin like this

    (a, b) = (x, y)
    

    is equivalent to this one:

    a = x AND b = y
    

    so the above query can be rewritten into this one:

    SELECT * FROM t1
    WHERE NOT (
       id = 1 AND val = NULL
       OR
       id = 2 AND val = 2
    )
    

    According to De Morgan's laws this is identical to:

    SELECT * FROM t1
    WHERE 
       NOT ( id = 1 AND val = NULL )
       AND
       NOT ( id = 2 AND val = 2 )
    

    and further to:

    SELECT * FROM t1
    WHERE 
       ( id <> 1 OR val <> NULL )
       AND
       ( id <> 2 OR val <> 2 )
    

    Since the first part ( id <> 1 OR val <> NULL ) of the condition evaluates to true only in a case where id <> 1 (please see the 3VL table above), this condition can be simplified into:

    SELECT * FROM t1
    WHERE 
       ( id <> 1 )
       AND
       ( id <> 2 OR val <> 2 )
    

    and further (according to De Morgan's laws) into:

    SELECT * FROM t1
    WHERE 
       id <> 1 AND id <> 2
       OR
       id <> 1 AND  val <> 2
    

    so neither (1,1) nor (2,2) from the source data1 comply with these conditions.

    0 讨论(0)
提交回复
热议问题