Why MYSQL IN keyword not considering NULL values

前端 未结 6 1915
夕颜
夕颜 2020-12-06 05:40

I am using the following query:

select count(*) from Table1 where CurrentDateTime>\'2012-05-28 15:34:02.403504\' and Error not in (\'Timeout\',\'Connectio         


        
相关标签:
6条回答
  • 2020-12-06 06:18

    This :

    Error not in ('Timeout','Connection Error');
    

    is semantically equivalent to:

    Error <> 'TimeOut' AND Error <> 'Connection Error'
    

    Rules about null comparison applies to IN too. So if the value of Error is NULL, the database can't make the expression true.

    To fix, you could do this:

    COALESCE(Error,'') not in ('Timeout','Connection Error');
    

    Or better yet:

    Error IS NULL OR Error not in ('Timeout','Connection Error');
    

    Or more better yet:

     CASE WHEN Error IS NULL THEN 1
     ELSE Error not in ('Timeout','Connection Error') THEN 1
     END = 1
    

    OR doesn't short-circuit, CASE can somehow short-circuit your query


    Perhaps a concrete example could illustrate why NULL NOT IN expression returns nothing:

    Given this data: http://www.sqlfiddle.com/#!2/0d5da/11

    create table tbl
    (
      msg varchar(100) null,
      description varchar(100) not null
      );
    
    
    insert into tbl values
    ('hi', 'greet'),
    (null, 'nothing');
    

    And you do this expression:

    select 'hulk' as x, msg, description 
    from tbl where msg not in ('bruce','banner');
    

    That will output 'hi' only.

    The NOT IN is translated as:

    select 'hulk' as x, msg, description 
    from tbl where msg <> 'bruce' and msg <> 'banner';
    

    NULL <> 'bruce' can't be determined, not even true, not even false

    NULL <> 'banner' can't be determined, not even true not even false

    So the null value expression, effectively resolved to:

    can't be determined AND can't bedetermined
    

    In fact, if your RDBMS supports boolean on SELECT(e.g. MySQL, Postgresql), you can see why: http://www.sqlfiddle.com/#!2/d41d8/828

    select null <> 'Bruce' 
    

    That returns null.

    This returns null too:

    select null <> 'Bruce' and null <> 'Banner'
    

    Given you are using NOT IN, which is basically an AND expression.

    NULL AND NULL
    

    Results to NULL. So it's like you are doing a: http://www.sqlfiddle.com/#!2/0d5da/12

    select * from tbl where null
    

    Nothing will be returned

    0 讨论(0)
  • 2020-12-06 06:19

    Because null is undefined so null does not equal null. You always have to explicitly handle null.

    0 讨论(0)
  • 2020-12-06 06:25

    @Michael Buen ' s answer was the right answer for my case, but let me simplify why.

    @Michael says in his post:


    Error not in ('Timeout','Connection Error');

    is semantically equivalent to:

    Error <> 'TimeOut' AND Error <> 'Connection Error'

    Rules about null comparison applies to IN too. So if the value of Error is NULL, the database can't make the expression true.

    And in [1] I found this sentence which confirms his most important statement for understanding why IN fails with NULL. In the specifications ("specs") in [1] you will: "If one or both arguments are NULL, the result of the comparison is NULL, except for the NULL-safe <=> equality comparison operator."

    So yeah, the thing is that sadly Mysql gets lost in such a case. I think Mysql designers shouldn't have done this, because when I compare 2 to NULL, Mysql SHOULD be able to see they are DIFFERENT, and not simply throwing mistaken results. For example, I did:

    select id from TABLE where id not in (COLUMN WITH NULLS);
    

    then it throws EMPTY results. BUT. If I do

    select id from TABLE where id not in (COLUMN WITH OUT NULLS);
    

    it shows the right result. So when using the IN operator, you must filter out the NULLS. This is not a desired behavior for me as a user, but it's documented in the specifications in [1]. I think that languages and technology should be simpler, in the sense that you should be able to DEDUCE without the need of reading the specs. And truly, 2 is DIFFERENT from NULL, I should be the one in charge of controlling and taking care of mistakes of a higher level of abstraction, but MySQL SHOULD throw a FALSE result when comparing NULL with a specific value.

    References for the specs: [1] http://dev.mysql.com/doc/refman/5.6/en/type-conversion.html

    0 讨论(0)
  • 2020-12-06 06:30

    IN returns NULL if the expression on the left hand side is NULL. In order to get the NULL values, you have to do:

    select count(*) from Table1 where CurrentDateTime>'2012-05-28 15:34:02.403504' and (Error not in ('Timeout','Connection Error') or Error is null);
    
    0 讨论(0)
  • 2020-12-06 06:33

    Sorry for posting twice in the same forum, but I want to illustrate another example:

    I agree with @Wagner Bianchi in [2] in this forum when he says: << It’s very trick when dealing with data and subqueries>>

    Moreover, this should NOT be the behavior, I think Mysql's designers are mistaken when they made this decision documented in [1]. The design should be different. Let me explain: You know that when comparing

    select (2) not in (1, 4, 3);
        you will get:
            +----------------------+
            | (2) not in (1, 4, 3) |
            +----------------------+
            |                    1 |
            +----------------------+
            1 row in set (0.00 sec)
    

    BUT if in the list you have at least one NULL then:

    select (2) not in (1, NULL, 3);
        throws:
            +-------------------------+
            | (2) not in (1, NULL, 3) |
            +-------------------------+
            |                    NULL |
            +-------------------------+
            1 row in set (0.00 sec)
        This is pretty absurd.
    

    We are not the first ones in getting confused by this. See [2]

    References:

    [1] http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_in

    [2] http://blog.9minutesnooze.com/sql-not-in-subquery-null/comment-page-1/#comment-86954

    0 讨论(0)
  • 2020-12-06 06:40

    IN returns a trivalent BOOLEAN (which accepts NULL as a value). NOT IN returns the trivalent negation of IN, and negation of NULL is a NULL.

    Imagine we have a table with all numbers from 1 to 1,000,000 in id and this query:

    SELECT  *
    FROM    mytable
    WHERE   id IN (1, 2, NULL)
    

    or its equivalent:

    SELECT  *
    FROM    mytable
    WHERE   id = ANY
                 (
                 SELECT  1
                 UNION ALL
                 SELECT  2
                 UNION ALL
                 SELECT  NULL
                 )
    

    The predicate returns TRUE for 1 and 2 and NULL for all other values, so 1 and 2 are returned.

    In its oppposite:

    SELECT  *
    FROM    mytable
    WHERE   id NOT IN (1, 2, NULL)
    

    , or

    SELECT  *
    FROM    mytable
    WHERE   id <> ALL
                 (
                 SELECT  1
                 UNION ALL
                 SELECT  2
                 UNION ALL
                 SELECT  NULL
                 )
    

    , the predicate returns FALSE for 1 and 2 and NULL for all other values, so nothing is returned.

    Note that boolean negation not only changes the operator (= to <>), but the quantifier too (ANY to ALL).

    0 讨论(0)
提交回复
热议问题