Some while ago, I\'ve been reading through the book SQL and Relational Theory by C. J. Date. The author is well-known for criticising SQL\'s three-valued logic (3VL).
I recommend you go for your option 2. I'm fairly certain Chris Date would too because essentially what you are doing is fully normalizing to 6NF, the highest possible normal form which Date was jointly responsible for introducing. I second the recommended Darwen's paper on handling missing information.
Since OUTER JOINs won't be allowed (because they would introduce NULL into the result set), all the necessary data could possibly no longer be fetched with just a single query as before.
…this is not the case, but I agree the issue of outer join is not explicitly mentioned in the Darwen paper; it was the one thing that left me wanting. The explicit answer may be found in another of Date's book…
First, note that Date and Darwen's own truly relational language Tutorial D has but one join type being the natural join. The justification is that only one join type is actually needed.
The Date book I alluded to is the excellent SQL and Relational Theory: How to Write Accurate SQL Code:
4.6: A Remark on Outer Join: "Relationally speaking, [outer join is] a kind of shotgun marriage: It forces tables into a kind of union—yes, I do mean union, not join—even when the tables in question fail to conform to the usual requirements for union... It does this, in effect, by padding one or both of the tables with nulls before doing the union, thereby making them conform to those usual requirements after all. But there's no reason why that padding shouldn't be done with proper values instead of nulls
Using your example and default value '1900-01-01' as 'padding', the alternative to outer join could look like this:
SELECT p.PersonID, p.Name, b.DateOfBirth
FROM Person AS p
INNER JOIN BirthDate AS b
ON p.PersonID = b.PersonID
UNION
SELECT p.PersonID, p.Name, '1900-01-01' AS DateOfBirth
FROM Person AS p
WHERE NOT EXISTS (
SELECT *
FROM BirthDate AS b
WHERE p.PersonID = b.PersonID
);
Darwen's paper proses two explicit tables, say BirthDate
and BirthDateKnown
, but the SQL would not be much different e.g. a semi join to BirthDateKnown
in place of the semi difference to BirthDate
above.
Note the above uses JOIN
and INNER JOIN
only because Standard SQL-92 NATURAL JOIN
and UNION CORRESPONDING
are not widely implemented in real life SQL products (can't find a citation but IIRC Darwen was largely responsible for the latter two making it into the Standard).
Further note the above syntax looks long-winded only because SQL in general is long-winded. In pure relational algebra it is more like (pseudo code):
Person JOIN BirthDate UNION Person NOT MATCHING BirthDate ADD '1900-01-01' AS DateOfBirth;