What are the pros and cons of using NULL values in SQL as opposed to default values?
Null values are not ... values!
Null means 'has no value' ... beside the database aspect, one important dimension of non valued variables or fields is that it is not possible to use '=' (or '>', '<'), when comparing variables.
Writting something like (VB):
if myFirstValue = mySecondValue
will not return either True or False if one or both of the variables are non-valued. You will have to use a 'turnaround' such as:
if (isnull(myFirstValue) and isNull(mySecondValue)) or myFirstValue = mySecondValue
The 'usual' code used in such circumstances is
if Nz(myFirstValue) = Nz(mySecondValue, defaultValue)
Is not strictly correct, as non-valued variables will be considered as 'equal' to the 'defaultValue' value (usually Zero-length string).
In spite of this unpleasant behaviour, never never never turn on your default values to zero-length string (or '0's) without a valuable reason, and easing value comparison in code is not a valuable reason.
A NULL value in databases is a system value that takes up one byte of storage and indicates that a value is not present as opposed to a space or zero or any other default value. The field in a database containing the NULL value means that the content of this cell is unknown at the time of looking at it. A column that allows NULL values also allows rows to be inserted with no values at all in that column. There are several pros and cons of using NULL values as opposed to default values:
Pros
NULL value does not have the data type, therefore can be inserted to any data structure and any database column. Default values, on the other hand, need to have their data type specified and a default value in one column might look the same in another column, but it might be of a different type.
NULL is often used in schemas where a value is optional. It is a convenient method for omitting data entry for unknown fields without having to implement additional rules, like storing negative values in an integer field to represent omitted data.
Since the NULL value takes up only 1 bit of memory space, they may be useful when optimising the database. Using those values is much more efficient than default values, e.g. character’s 8 bits and integer’s 16bits.
While your system requirements may change over time and the default value types with them, NULL value is always NULL so there is no need to update the type of data.
Assigning Not Null to table schemas can also help with table validation, in a sense that the column with Not Null criteria will require a value to be inserted. Default values do not have these capabilities.
Cons
NULL values are easily confused with empty character strings, which return a blank value to the user when selected. In this sense, default values are less confusing and are the safer option, unless the default value is set to the empty string.
If NULL values are allowed in the database, they may cause the designer some extra time and work as they can make the database logic more complicated, especially when there are a lot of comparisons to null values in place.
Source: Pro and cons
It depends on the situation, but it's really ultimately simple. Which one is closer to the truth?
A lot of people deal with data as though it's just data, and truth doesn't matter. However, whenever you talk to the stakeholders in the data, you find that truth always matters. sometimes more, sometimes less, but it always matters.
A default value is useful when you may presume that if the user (or other data source) had provided a value, the value would have been the default. If this presumption does more harm then good, then NULL is better, even though dealing with NULL is a pain in SQL.
Note that there are three different ways default values can be implemented. First, in the application, before inserting new data. The database never sees the difference between a default value provided by the user or one provided by the app!
Second, by declaring a default value for the column, and leaving the data missing in an insert.
Third, by substituting the default value at retrieval time, whenever a NULL is detected. Only a few DBMS products permit this third mode to be declared in the database.
In an ideal world, data is never missing. If you are developing for the real world, required data will eventually be missing. Your applications can either do something that makes sense or something that doesn't make sense when that happens.
I so appreciate all of this discussion. I am in the midst of building a data warehouse and am using the Kimball model rather strictly. There is one very vocal user, however, who hates surrogate keys and wants NULLs all over the place. I told him that it is OK to have NULLable columns for attributes of dimensions and for any dates or numbers that are used in calculations because default values there imply incorrect data. There are, I agree, advantages to allowing NULL in certain columns but it makes cubing a lot better and more reliable if there is a surrogate key for every foreign key to a dimension, even if that surrogate is -1 or 0 for a dummy record. SQL likes integers for joins and if there is a missing dimension value and a dummy is provided as a surrogate key, then you will get the same number of records using one dimension as you would cubing on another dimension. However, calculations have to be done correctly and you have to accommodate for NULL values in those. Birthday should be NULL so that age is not calculated, for example. I believe in good data governance and making these decisions with the users forces them to think about their data in more ways than ever.
Two very good Access-oriented articles about Nulls by Allen Browne:
Aspects of working with Nulls in VBA code:
The articles are Access-oriented, but could be valuable to those using any database, particularly relative novices because of the conversational style of the writing.
To me, they are somewhat orthogonal.
Default values allow you to gracefully evolve your database schema (think adding columns) without having to modify client code. Plus, they save some typing, but relying on default values for this is IMO bad.
Nulls are just that: null
s. Missing value and a huge PITA when dealing with Three-Valued Logic.