I\'m not asking if it does. I know that it doesn\'t.
I\'m curious as to the reason. I\'ve read support docs such as as this one on Working With Nu
It's a language semantic.
Null is the lack of a value.
is null
makes sense to me. It says, "is lacking a value" or "is unknown". Personally I've never asked somebody if something is, "equal to lacking a value".
The concept is that NULL is not an equitable value. It denotes the absence of a value.
Therefore, a variable or a column can only be checked if it IS NULL
, but not if it IS EQUAL TO NULL
.
Once you open up arithmetic comparisions, you may have to contend with IS GREATER THAN NULL
, or IS LESS THAN OR EQUAL TO NULL
In addition to all that has already been said, I wish to stress that what you write in your first line is wrong. SQL does support the “= NULL” syntax, but it has a different semantic than “IS NULL” – as can be seen in the very piece of documentation you linked to.
a. Null is not the "lack of a value"
b. Null is not "empty"
c. Null is not an "unset value"
It's all of the above and none of the above.
By technical rights, NULL is an "unknown value". However, like uninitialized pointers in C/C++, you don't really know what your pointing at. With databases, they allocate the space but do not initialize the value in that space.
So, it is an "empty" space in the sense that it's not initialized. If you set a value to NULL, the original value stays in that storage location. If it was originally an empty string (for example), it will remain that.
It's a "lack of a value" in the fact that it hasn't been set to what the database deems a valid value.
It's an "unset value" in that if the space was just allocated, the value that is there has never been set.
"Unknown" is the closest that we can truly come to knowing what to expect when we examine a NULL.
Because of that, if we try to compare this "unknown" value, we will get a comparison that
a) may or may not be valid
b) may or may not have the result we expect
c) may or may not crash the database.
So, the DBMS systems (long ago) decided that it doesn't even make sense to use equality when it comes to NULL.
Therefore, "= null" makes no sense.
I agree with the OP that
where column_name = null
should be syntactic sugar for
where column_name is null
However, I do understand why the creators of SQL wanted to make the distinction. In three-valued logic (IMO this is a misnomer), a predicate can return two values (true or false) OR unknown which is technically not a value but just a way to say "we don't know which of the two values this is". Think about the following predicate in terms of three-valued logic:
A == B
This predicate tests whether A is equal to B. Here's what the truth table looks like:
T U F
-----
T | T U F
U | U U U
F | F U T
If either A or B is unknown, the predicate itself always returns unknown, regardless of whether the other one is true or false or unknown.
In SQL, null is a synonym for unknown. So, the SQL predicate
column_name = null
tests whether the value of column_name is equal to something whose value is unknown, and returns unknown regardless of whether column_name is true or false or unknown or anything else, just like in three-valued logic above. SQL DML operations are restricted to operating on rows for which the predicate in the where clause returns true, ignoring rows for which the predicate returns false or unknown. That's why "where column_name = null" doesn't operate on any rows.
I can't help but feel that you're still not satisfied with the answers that have been given so far, so I thought I'd try another tack. Let's have an example (no, I've no idea why this specific example has come into my head).
We have a table for employees, EMP
:
EMP
---
EMPNO GIVENNAME
E0001 Boris
E0002 Chris
E0003 Dave
E0004 Steve
E0005 Tony
And, for whatever bizarre reason, we're tracking what colour trousers each employee chooses to wear on a particular day (TROUS
):
TROUS
-----
EMPNO DATE COLOUR
E0001 20110806 Brown
E0002 20110806 Blue
E0003 20110806 Black
E0004 20110806 Brown
E0005 20110806 Black
E0001 20110807 Black
E0003 20110807 Black
E0004 20110807 Grey
I could go on. We write a query, where we want to know the name of every employee, and what colour trousers they had on on the 7th August:
SELECT e.GIVENNAME,t.COLOUR
FROM
EMP e
LEFT JOIN
TROUS t
ON
e.EMPNO = t.EMPNO and
t.DATE = '20110807'
And we get the result set:
GIVENNAME COLOUR
Chris NULL
Steve Grey
Dave Black
Boris Black
Tony NULL
Now, this result set could be in a view, or CTE, or whatever, and we might want to continue asking questions about these results, using SQL. What might some of these questions be?
Were Dave and Boris wearing the same colour trousers on that day? (Yes, Black==Black)
Were Dave and Steve wearing the same colour trousers on that day? (No, Black!=Grey)
Were Boris and Tony wearing the same colour trousers on that day? (Unknown - we're trying to compare with NULL, and we're following the SQL rules)
Were Boris and Tony not wearing the same colour trousers on that day? (Unknown - we're again comparing to NULL, and we're following SQL rules)
Were Chris and Tony wearing the same colour trousers on that day? (Unknown)
Note, that you're already aware of specific mechanisms (e.g. IS NULL
) to force the outcomes you want, if you've designed your database to never use NULL as a marker for missing information.
But in SQL, NULL has been given two roles (at least) - to mark inapplicable information (maybe we have complete information in the database, and Chris and Tony didn't turn up for work that day, or did but weren't wearing trousers), and to mark missing information (Chris did turn up that day, we just don't have the information recorded in the database at this time)
If you're using NULL
purely as a marker of inapplicable information, I assume you're avoiding such constructs as outer joins.
I find it interesting that you've brought up NaN
in comments to other answers, without seeing that NaN
and (SQL) NULL
have a lot in common. The biggest difference between them is that NULL
is intended for use across the system, no matter what data type is involved.
You're biggest issue seems to be that you've decided that NULL has a single meaning across all programming languages, and you seem to feel that SQL has broken that meaning. In fact, null in different languages frequently has subtly different meanings. In some languages, it's a synonym for 0. In others, not, so the comparison 0==null
will succeed in some, and fail in others. You mentioned VB, but VB (assuming you're talking .NET versions) does not have null. It has Nothing
, which again is subtly different (it's the equivalent in most respects of the C# construct default(T)
).