SELECT id FROM customers WHERE type IS NOT Null;
Versus:
SELECT id FROM customers WHERE NOT type IS NULL;
The data th
In the usual case where the LHS term is a simple variable or expression, there is no difference between NOT x IS NULL
and x IS NOT NULL
. The optimizer will treat the two identically.
However, in full SQL, the LHS term is not limited to being a simple variable or expression; in the formal grammar, the LHS is a <row value predicand>
:
§8.7
<null predicate>
(p395)Specify a test for a null value.
<null predicate> ::= <row value predicand> <null predicate part 2> <null predicate part 2> ::= IS [ NOT ] NULL
And chasing through the grammar, you find that:
§7.2
<row value expression>
(p296)Specify a row value.
[...]
<row value predicand> ::= <row value special case> | <row value constructor predicand> <row value special case> ::= <nonparenthesized value expression primary>
And:
§7.1
<row value constructor>
(p293)Specify a value or list of values to be constructed into a row or partial row.
<row value constructor> ::= <common value expression> | <boolean value expression> | <explicit row value constructor>
[...]
<row value constructor predicand> ::= <common value expression> | <boolean predicand> | <explicit row value constructor>
And so it goes on. (Chasing anything through the SQL standard is hard work. You can find a heavily hyperlinked version of the standard at http://savage.net.au/SQL/.)
However, as you may guess from the mention of 'row value', you can have multiple simple expressions combined on the LHS to form a 'row value constructor predicand'. And then there is a difference between the two forms.
Conceptually, you have:
(val1, val2, val3) IS NOT NULL
vs
NOT (val1, val2, val3) IS NULL
Now, in the first case, you get TRUE if each of val1, val2 and val3 is not NULL. In the second case, you get TRUE if any one of val1, val2, val3 is not NULL. So, there are circumstances where the two operations are not identical.
However, as I said up front, for the usual case of a simple column or expression, there is no difference between the two.
There is no difference.
It seems to me that there might be a difference when it comes to performance. Anyone care to elaborate on this?
All major engines (that is MySQL
, SQL Server
, Oracle
and PostgreSQL
) will merge these predicates on parsing stage, making identical plans from them.
Handling of these conditions is more complex that mere applying operators in one or another order.
For instance, in Oracle
, an IS NOT NULL
(or NOT IS NULL
) condition implies a possibility to use an index, so a query like this:
SELECT column
FROM mytable
WHERE column IS NOT NULL
will most probably be executed with an index fast full scan
, with no additional checks made in runtime (since the NULL
values just won't make it into the index, so it's no use to check them).
Even if each record would need to be checked, the order of checks will be defined by the optimizer (and not by the order the predicates and operators appear in the WHERE
clause).
For instance, here is a plan for an Oracle
query:
SQL> EXPLAIN PLAN FOR
2
2 SELECT *
3 FROM t_test
4 WHERE NOT column IS NULL
5 /
Explained
SQL> SELECT *
2 FROM TABLE(DBMS_XPLAN.display())
3 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 958699830
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30 | 1260 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T_TEST | 30 | 1260 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COLUMN" IS NOT NULL)
As you can see, the filter
was translated internally into an IS NOT NULL
(which Oracle
along with most commenters seems to believe to be a more appropriate form)
Update:
As Jonathan Leffler pointed out, these is difference when evaluating tuples (as opposed to single columns).
A tuple consisting of mixed NULL
and non-NULL
values is neither a NULL
nor a NOT NULL
.
In PostgreSQL
(which supports this predicate against tuples), both these expressions:
SELECT (1, NULL) IS NULL
SELECT (1, NULL) IS NOT NULL
evaluate to false.
IS NOT NULL
is a comparison operator, just like IS NULL
or =
, >
, <
, etc.
NOT
is a logical operator that acts on the rest of the condition. So you can say NOT type = 5
, NOT type IS NULL
, or even NOT type IS NOT NULL
.
My point here is to point out that they are two very different operators, even though the result is the same. Of course, in boolean logic, there is no difference between NOT (column IS NULL)
and column IS NOT NULL
, but it's wise to know the difference.
As for performance, IS NOT NULL
might save you a few cycles over NOT ... IS NULL
because you are using a single operator instead of two operators, but any reasonable optimizer will figure out they are the same thing before the query is run.