Options for eliminating NULLable columns from a DB model (in order to avoid SQL's three-valued logic)?

后端 未结 7 1471
温柔的废话
温柔的废话 2021-02-08 20:19

Some while ago, I\'ve been reading through the book SQL and Relational Theory by C. J. Date. The author is well-known for criticising SQL\'s three-valued logic (3VL).

相关标签:
7条回答
  • 2021-02-08 21:01

    One option is to use explicit option types, analogous to Haskell's Maybe functor.

    Unfortunately a lot of existing SQL implementations have poor support for user-defined algebraic data types and even poorer support for user-defined type constructors that you really need to do this cleanly.

    This recovers a sort of "null" for only those attributes where you explicitly ask for it, but without null's silly three-valued logic. Nothing == Nothing is True, not unknown or null.

    Support for user-defined algebraic types also helps when there are a few reasons for missing information, for example a database equivalent of the following Haskell type would be a good solution for the obvious application:

    data EmploymentStatus = Employed EmployerID | Unemployed | Unknown
    

    (Of course, a database supporting this would also need to support the more-complicated-than-usual foreign key constraint that comes with it.)

    Short of this, I agree with APC's and onedaywhen's answers about 6NF.

    0 讨论(0)
  • 2021-02-08 21:06

    You can eliminate null in the output as well by using COALESCE.

    SELECT personid  /*primary key, will never be null here*/
           , COALESCE(name, 'no name') as name
           , COALESCE(birthdate,'no date') as birthdate
    FROM people
    

    Not all databases support COALESCE, but almost all have a fallback option called
    IFNULL(arg1, arg2) or something simular that will do the same (but only for 2 arguments).

    0 讨论(0)
  • 2021-02-08 21:08

    I saw Date's colleague Hugh Darwen discuss this issue in an excellent presentation "How To Handle Missing Information Without Using NULL", which is available on the Third Manifesto website.

    His solution is a variant on your second approach. It's sixth normal form, with tables to hold both Date of Birth and identifiers where it is unknown:

    #  +-----------------------------+ 1    0..1 +----------------------------+
    #  |         People'             | <-------> |         DatesOfBirth       |
    #  +------------+----------------+           +------------+---------------+
    #  |  PersonID  |  Name          |           |  PersonID  |  DateOfBirth  |
    #  +============+----------------+           +============+---------------+
    #  |  1         |  Banana Man    |           ! 2          | 20-MAY-1991   |
    #  |  2         |  Satsuma Girl  |           +------------+---------------+
    #  +------------+----------------+
    #                                  1    0..1 +------------+
    #                                  <-------> | DobUnknown |
    #                                            +------------+
    #                                            |  PersonID  |
    #                                            +============+
    #                                            | 1          |
    #                                            +------------+
    

    Selecting from People then requires joining all three tables, including boilerplate to indicate the unknown Dates Of Birth.

    Of course, this is somewhat theoretical. The state of SQL these days is still not sufficiently advanced to handle all this. Hugh's presentation covers these shortcomings. One thing he mentions is not entirely correct: some flavours of SQL do support multiple assignment - for instance Oracle's INSERT ALL syntax.

    0 讨论(0)
  • Option 3: Onus on the record writer:

    CREATE TABLE Person
    (
      PersonId int PRIMARY KEY IDENTITY(1,1),
      Name nvarchar(100) NOT NULL,
      DateOfBirth datetime NOT NULL
    )
    

    Why contort a model to allow null representation when your goal is to eliminate them?

    0 讨论(0)
  • 2021-02-08 21:15

    I recommend you go for your option 2. I'm fairly certain Chris Date would too because essentially what you are doing is fully normalizing to 6NF, the highest possible normal form which Date was jointly responsible for introducing. I second the recommended Darwen's paper on handling missing information.

    Since OUTER JOINs won't be allowed (because they would introduce NULL into the result set), all the necessary data could possibly no longer be fetched with just a single query as before.

    …this is not the case, but I agree the issue of outer join is not explicitly mentioned in the Darwen paper; it was the one thing that left me wanting. The explicit answer may be found in another of Date's book…

    First, note that Date and Darwen's own truly relational language Tutorial D has but one join type being the natural join. The justification is that only one join type is actually needed.

    The Date book I alluded to is the excellent SQL and Relational Theory: How to Write Accurate SQL Code:

    4.6: A Remark on Outer Join: "Relationally speaking, [outer join is] a kind of shotgun marriage: It forces tables into a kind of union—yes, I do mean union, not join—even when the tables in question fail to conform to the usual requirements for union... It does this, in effect, by padding one or both of the tables with nulls before doing the union, thereby making them conform to those usual requirements after all. But there's no reason why that padding shouldn't be done with proper values instead of nulls

    Using your example and default value '1900-01-01' as 'padding', the alternative to outer join could look like this:

    SELECT p.PersonID, p.Name, b.DateOfBirth
      FROM Person AS p
           INNER JOIN BirthDate AS b
              ON p.PersonID = b.PersonID
    UNION
    SELECT p.PersonID, p.Name, '1900-01-01' AS DateOfBirth
      FROM Person AS p
     WHERE NOT EXISTS (
                       SELECT * 
                         FROM BirthDate AS b
                        WHERE p.PersonID = b.PersonID
                      );
    

    Darwen's paper proses two explicit tables, say BirthDate and BirthDateKnown, but the SQL would not be much different e.g. a semi join to BirthDateKnown in place of the semi difference to BirthDate above.

    Note the above uses JOIN and INNER JOIN only because Standard SQL-92 NATURAL JOIN and UNION CORRESPONDING are not widely implemented in real life SQL products (can't find a citation but IIRC Darwen was largely responsible for the latter two making it into the Standard).

    Further note the above syntax looks long-winded only because SQL in general is long-winded. In pure relational algebra it is more like (pseudo code):

    Person JOIN BirthDate UNION Person NOT MATCHING BirthDate ADD '1900-01-01' AS DateOfBirth;
    
    0 讨论(0)
  • 2021-02-08 21:19

    I haven't read it, but there's an article called How To Handle Missing Information Using S-by-C on the the Third Manifesto website that's run by Hugh Darwen and C.J. Date. This isn't written by C.J. Date, but I'd assume that since it's one of the articles on that website it's probably similar to his opinions.

    0 讨论(0)
提交回复
热议问题