In SQL server if you have nullParam=NULL
in a where clause, it always evaluates to false. This is counterintuitive and has caused me many errors. I do understa
The question:
Does one unknown equal another unknown?
(NULL = NULL)
That question is something no one can answer so it defaults to true or false depending on your ansi_nulls setting.
However the question:
Is this unknown variable unknown?
This question is quite different and can be answered with true.
nullVariable = null is comparing the values
nullVariable is null is comparing the state of the variable
Because NULL
means 'unknown value' and two unknown values cannot be equal.
So, if to our logic NULL
N°1 is equal to NULL
N°2, then we have to tell that somehow:
SELECT 1
WHERE ISNULL(nullParam1, -1) = ISNULL(nullParam2, -1)
where known value -1
N°1 is equal to -1
N°2
The concept of NULL is questionable, to say the least. Codd introduced the relational model and the concept of NULL in context (and went on to propose more than one kind of NULL!) However, relational theory has evolved since Codd's original writings: some of his proposals have since been dropped (e.g. primary key) and others never caught on (e.g. theta operators). In modern relational theory (truly relational theory, I should stress) NULL simply does not exist. See The Third Manifesto. http://www.thethirdmanifesto.com/
The SQL language suffers the problem of backwards compatibility. NULL found its way into SQL and we are stuck with it. Arguably, the implementation of NULL
in SQL is flawed (SQL Server's implementation makes things even more complicated due to its ANSI_NULLS
option).
I recommend avoiding the use of NULLable columns in base tables.
Although perhaps I shouldn't be tempted, I just wanted to assert a corrections of my own about how NULL
works in SQL:
NULL
= NULL
evaluates to UNKNOWN
.
UNKNOWN
is a logical value.
NULL
is a data value.
This is easy to prove e.g.
SELECT NULL = NULL
correctly generates an error in SQL Server. If the result was a data value then we would expect to see NULL
, as some answers here (wrongly) suggest we would.
The logical value UNKNOWN
is treated differently in SQL DML and SQL DDL respectively.
In SQL DML, UNKNOWN
causes rows to be removed from the resultset.
For example:
CREATE TABLE MyTable
(
key_col INTEGER NOT NULL UNIQUE,
data_col INTEGER
CHECK (data_col = 55)
);
INSERT INTO MyTable (key_col, data_col)
VALUES (1, NULL);
The INSERT
succeeds for this row, even though the CHECK
condition resolves to NULL = NULL
. This is due defined in the SQL-92 ("ANSI") Standard:
11.6 table constraint definition
3)
If the table constraint is a check constraint definition, then let SC be the search condition immediately contained in the check constraint definition and let T be the table name included in the corresponding table constraint descriptor; the table constraint is not satisfied if and only if
EXISTS ( SELECT * FROM T WHERE NOT ( SC ) )
is true.
Read that again carefully, following the logic.
In plain English, our new row above is given the 'benefit of the doubt' about being UNKNOWN
and allowed to pass.
In SQL DML, the rule for the WHERE
clause is much easier to follow:
The search condition is applied to each row of T. The result of the where clause is a table of those rows of T for which the result of the search condition is true.
In plain English, rows that evaluate to UNKNOWN
are removed from the resultset.
You work for the government registering information about citizens. This includes the national ID for every person in the country. A child was left at the door of a church some 40 years ago, nobody knows who their parents are. This person's father ID is NULL
. Two such people exist. Count people who share the same father ID with at least one other person (people who are siblings). Do you count those two too?
The answer is no, you don’t, because we don’t know if they are siblings or not.
Suppose you don’t have a NULL
option, and instead use some pre-determined value to represent “the unknown”, perhaps an empty string or the number 0 or a * character, etc. Then you would have in your queries that * = *, 0 = 0, and “” = “”, etc. This is not what you want (as per the example above), and as you might often forget about these cases (the example above is a clear fringe case outside ordinary everyday thinking), then you need the language to remember for you that NULL = NULL
is not true.
Necessity is the mother of invention.
How old is Frank? I don't know (null).
How old is Shirley? I don't know (null).
Are Frank and Shirley the same age?
Correct answer should be "I don't know" (null), not "no", as Frank and Shirley might be the same age, we simply don't know.
Just because you don't know what two things are, does not mean they're equal. If when you think of NULL
you think of “NULL” (string) then you probably want a different test of equality like Postgresql's IS DISTINCT FROM
AND IS NOT DISTINCT FROM
From the PostgreSQL docs on "Comparison Functions and Operators"
expression
IS DISTINCT FROM
expressionexpression
IS NOT DISTINCT FROM
expressionFor non-null inputs,
IS DISTINCT FROM
is the same as the<>
operator. However, if both inputs are null it returns false, and if only one input is null it returns true. Similarly,IS NOT DISTINCT FROM
is identical to=
for non-null inputs, but it returns true when both inputs are null, and false when only one input is null. Thus, these constructs effectively act as though null were a normal data value, rather than "unknown".