Standard use of 'Z' instead of NULL to represent missing data?

前端 未结 8 485
借酒劲吻你
借酒劲吻你 2020-12-12 20:08

Outside of the argument of whether or not NULLs should ever be used: I am responsible for an existing database that uses NULL to mean "missing or never entered" da

相关标签:
8条回答
  • 2020-12-12 20:45

    In reply to contractors comments

    • Empty string <> NULL
    • Empty string requires 2 bytes storage + an offset read
    • NULL uses null bitmap = quicker
    • IDENTITY doesn't always start at 1 (why waste half your range?)

    The whole concept is flawed as per most other answers here

    0 讨论(0)
  • 2020-12-12 20:48

    Nothing in principle requires nulls for correct database design. In fact there are plenty of databases designed without using null and there are plenty of very good database designers and whole development teams who design databases without using nulls. In general it's a good thing to be cautious about adding nulls to a database because they inevitably lead to incorrect or ambiguous results later on.

    I've not heard of using Z being called "standard practice" as a placeholder value instead of nulls but I expect your contractor is referring to the concept of sentinel values in general, which are sometimes used in database design. However, a much more common and flexible way to avoid nulls without using "dummy" data is simply to design them out. Decompose the table such that each type of fact is recorded in a table that doesn't have "extra", unspecified attributes.

    0 讨论(0)
  • 2020-12-12 20:54

    I've never heard about the wide-spread use of 'Z' as a substitute for NULL.

    (Incidentally, I'd not particularly like to work with a contractor who tells you in the face that they and other "advanced" DBAs are so much more knowledgeable and better than you.)

     +=================================+
     |  FavoriteLetters                |
     +=================================+
     |  Person      |  FavoriteLetter  |
     +--------------+------------------+
     |  'Anna'      |  'A'             |
     |  'Bob'       |  'B'             |
     |  'Claire'    |  'C'             |
     |  'Zaphod'    |  'Z'             |
     +---------------------------------+
    

    How would your contractor interpret the data from the last row?

    Probably he would choose a different "magic value" in this table to avoid collision with the real data 'Z'? Meaning you'd have to remember several magic values and also which one is used where... how is this better than having just one magic token NULL, and having to remember the three-valued logic rules (and pitfalls) that go with it? NULL at least is standardized, unlike your contractor's 'Z'.

    I don't particularly like NULL either, but mindlessly substituting it with an actual value (or worse, with several actual values) everywhere is almost definitely worse than NULL.

    Let me repeat my above comment here for better visibility: If you want to read something serious and well-grounded by people who are against NULL, I would recommend the short article "How to handle missing information without using NULLs" (links to a PDF from The Third Manifesto homepage).

    0 讨论(0)
  • 2020-12-12 20:59

    This is easily one of the weirdest opinions I've ever heard. Using a magic value to represent "no data" rather than NULL means that every piece of code that you have will have to post-process the results to account/discard the "no-data"/"Z" values.

    NULL is special because of the way that the database handles it in queries. For instance, take these two simple queries:

    select * from mytable where name = 'bob';
    select * from mytable where name != 'bob';
    

    If name is ever NULL, it obviously won't show up in the first query's results. More importantly, neither will it show up in the second queries results. NULL doesn't match anything other than an explicit search for NULL, as in:

    select * from mytable where name is NULL;
    

    And what happens when the data could have Z as a valid value? Let's say you're storing someone's middle initial? Would Zachary Z Zonkas be lumped in with those people with no middle initial? Or would your contractor come up with yet another magic value to handle this?

    Avoid magic values that require you to implement database features in code that the database is already fully capable of handling. This is a solved and well understood problem, and it may just be that your contractor never really grokked the notion of NULL and therefore avoids using it.

    0 讨论(0)
  • 2020-12-12 20:59

    If the domain allows missing values, then using NULL to represent 'undefined' is perfectly OK (that's what it is there for). The only downside is that code that consumes the data has to be written to check for NULLs. This is the way I've always done it.

    I have never heard of (or seen in practice) the use of 'Z' to represent missing data. As to "the contractor cites this as 'standard practice' among DBAs", can he provide some evidence of that assertion? As @Dems mentioned, you also need to document that 'Z' doesn't mean 'Z': what about a MiddleInitial column?

    Like Aaron Alton and many others, I believe that NULL values are an integral part of database design, and should be used where appropriate.

    0 讨论(0)
  • 2020-12-12 21:03

    Sack your contractor.

    Okay, seriously, this isn't standard practice. This can be seen simply because all RDBMS that I have ever worked with implement NULL, logic for NULL, take account of NULL in foreign keys, have different behaviour for NULL in COUNT, etc, etc.

    I would actually contend that using 'Z' or any other place holder is worse. You still require code to check for 'Z'. But you also need to document that 'Z' doesn't mean 'Z', it means something else. And you have to ensure that such documentation is read. And then what happens if 'Z' ever becomes a valid piece of data? (Such as a field for an initial?)

    At a basic level, even without debating the validity of NULL vs 'Z', I would insist that the contractor conforms to standard practices that exist within your company, not his. Instituting his standard practice in an environment with an alternative standard practice will cause confusion, maintenance overheads, mis-understanding, and in the end increased costs and mistakes.


    EDIT

    There are cases where using an alternative to NULL is valid in my opinion. But only where doing so reduces code, rather than creating special cases which require accounting for.

    I've used that for date bound data, for example. If data is valid between a start-date and an end-date, code can be simplified by not having NULL values. Instead a NULL start-date could be replaced with '01 Jan 1900' and a NULL end-date could be replaced with '31 Dec 2079'.

    This still can change behaviour from what may be expected, and so should be used with care:

    • WHERE end-date IS NULL no longer give data that is still valid
    • You just created your own millennium bug
    • etc.

    This is equivalent to reforming abstractions such that all properties can always have valid values. It is markedly different from implicitly encoding specific meaning into arbitrarily chosen values.

    Still, sack the contractor.

    0 讨论(0)
提交回复
热议问题