Relational data modeling for sub types

后端 未结 3 847
不知归路
不知归路 2021-02-10 12:43

I am learning the Relational Model and data modeling.
And I have some confusion in my mind regarding sub types.

I know that data modeling is an iterative process and

相关标签:
3条回答
  • 2021-02-10 13:15

    Preliminary

    Good question, very thoughtful for a learner. I think what you are really after is a discussion, in order to obtain clarity, and this is a Data Modelling exercise.

    • I understand your progression up to and including 3.3. What, how, do you get 3.4 (after the step-wise progression to 3.3) ? To me, Combine all the above does not equal Generic.

    • Rather than following your progression, and erecting a model for each step, let me respond with a TRD for the relevant steps, per your discussion.

    • TRD Only Tables, which are identified by Keys, and Relationships are relevant at this stage, I think you are well aware of the Attributes, if any, and which Keys they would be deployed with. After you achieve a stable TRD, then you can expand it to full DM.

    • After erecting a model than it a progression from the previous one, and upon evaluation, if it is clear that it loses information, it can be safely discarded. There is a value is contemplating such models, so the step is not incorrect. But the continued discussion of it is a waste. I believe I demonstrated that in the previous question.

    Consider this set of Table Relation Diagrams.

    1.x

    From my perspective, A First is would be the first reasonable TRD that is worth contemplation.

    • I don't see how or why Proton/Neutron/Electron are Independent tables. They do not exist on their own, their weights; etc are fixed. They exist only in the context of an Atom.

    • Since every Atom comprises at least one Proton/Neutron/Electron, the Proton/Neutron/Electron columns can be deployed in Atom. Not drawn. Later.

    2.x

    Your progression is fine, except for one glaring error.

    common attributes about particle composition go in ParticleComposition, while special attributes about particle composition go in special tables.

    No. Common attributes about particle go in Particle. Attributes that are specific to the relationship (ie. not common) go in ParticleComposition. And then there are no "special attributes about particle composition", no "special tables".

    3.x

    Consider B Subtype. Your [3.1] is mostly correct, except for:

    • I don't see how Particle has children such as Proton/Neutron/Electron. Only an Atom has that.

    • I don't see how Particles are related to other Particles (ie. what is that ?). For the data discussed, a Molecule is made up of Atoms; An Atom is made up of Proton/Neutron/Electron; and a Particle is either a Molecule xor an Atom (Exclusive Subtype).

    • Please correct me if that is not correct.

    • Refer to Subtype Document for full details on that subject.

    That can be C Reduced, as you have stated. This holds the notion that Proton/Neutron/Electron information is fixed per Atom: that there is one entry for each. Eg. each shell/energy level is not differentiated; zero is acceptable for Neutrons (instead of Null).

    • I have discussed the great value of Predicates previously. The main point here is, the model identifies the Predicates. and the Predicates verify the model; it is a great feedback loop. I have given the Predicates, so that you can evaluate them for yourself, and check the validity of the model.

    3.3

    If it were fully D Normalised: the Atom always has at least a Proton entry; the Neutron entry is optional; and each Shell/energy level is differentiated.

    • Note the difference in the Predicates.

    • Note that although Reduction is a valid technique, it does not equate to Normalisation.

    3.4

    That appears to be the sum total of everything, laid out flat, or a flattened view (derived relations, a perspective, the result set). As such it is fine, for understanding. But if you proposed that as a set of tables, then it is horribly incorrect, due to various Normalisation errors. Which, if corrected, would progress to [3.3] and my [D Normalised].

    Question

    Does any of the above designs break the rules of Relational Model?

    All of them except [3.3] break a number of rules. Mostly they are in the category of Normalisation errors. There would be associated identification errors, if you were to have given a full model, or CREATE TABLE statements.

    But that does not matter if the context is data modelling exercise, for understanding. If this exercise was serious, then the paragraph above stands.

    This section is presented in accordance with the SO guidelines, specifically: correct misinformation whenever you see it. I did comment on the subject post, but they keep disappearing. Thus I have placed it here.

    Erwin Smout:
    When cut down to its bare essence, the relational model of data has no more than exactly one "rule" : all information in the database must be represented as values of attributes in tuples in relations.

    That is one of the rules, yes, but the enclosing statement is patently false.

    First, there are many essential or first-order rules in the Relational Model. From memory, I would say about forty.

    Second, there are many second-order rules, ones that are logically implied by the first-order rules.

    • People who have technical qualifications and experience, who can understand the RM, and who follow the spirit and intent, follow all of them.

    • Others may not recognise some of the first-order rules, or most of the implied rules.

    • And there are, as evidenced from the books that allege to be about the RM, yet others, people who actively subvert and diminish the RM. They ignore the second-order rules, and worse, they use pharisaic "logic" to undermine the first-order rules.

    • Here, Erwin, who is well-known for his efforts regarding the RM on comp.databases.theory and TTM, reduces the RM to one pithy rule, and thus undermines the full set of rules, and the RM itself. Specifically in answer to your question, which if not for my response, would lead readers to believe that the RM is what he makes it out to be: just one rule, that everything, relational as well as non-relation, "satisfies".

    • The Relational Model is freely available, you can read it for yourself. Let me know if you would like a copy. The caveat is, the terminology is out-dated, and needs to be explained.

    Second, even if one were to boil it down to one rule (impossible, too reductionist) or the most important rule (possible, but demeaning), that rule would not be it. That is one of the forty or so first-order rules, but certainly not ranked close to the top.

    • However, I grant that other people may have a different ranking, to suit their own purposes.

    • What people who understand the RM do discuss, as the main difference (not rule) between the RM and its predecessors, is this:

      • It was the first to have a complete mathematical definition (which forms its basis, and everything in it flows from that).

      • Whereas the predecessors related records using physical Record IDs, the RM demands (a) Logical Keys, made up from the data, and (b) relating rows (not records) by those Logical Keys.

    It must be mentioned, that is the basis upon which systems that are characterised by Record IDs in every file, declared as "primary key", are completely non-relational, a regression back to pre-1970 ISAM Record Filing Systems, the very thing that the RM made obsolete. Notice also, how those primitive systems can be made to appear "relational", because by schizophrenic "logic", they "satisfy" the one quoted rule. Honest logic destroys such nonsense.

    Such Record ID based systems have become the noram in the lower end of the industry precisely due to misinformation. Hence my willingness to correct it.

    End misinformation correction section.

    Which approach is the best?

    Formal Data Modelling, including Relational Normalisation. The method, the science, the principle, not the fragments of NF definitions.

    I do not perceive the proposals to be different approaches, rather it is laying out all your thoughts in one single modelling exercise. And the point where the model starts to take a serious, feasible shape is [3.3].

    Does it depend on how we think about the data?

    Of course. Your marriage will succeed or fail based on the perception you have about your wife, because that perception is the seat of all your actions. The model will succeed or fail based on your perception of the data.

    One of the great things about the Relational Model is that it teaches us to view (perceive, think about, model) the data, as data, and nothing but data. For one thing, that forms the Logical Key concept.

    Does it depend on the requirements?

    The first answer is, no, it should not depend on requirements. It should consider the data, the scope of which is limited to the enterprise (requirement, yes, but not the functional requirement), and only the data.

    And of course, for reasons that I have detailed elsewhere, the data model should match the real world, it should not be limited the the functional need agaist the data.

    The massive error, the common reason, for failure in the OO/ORM model, is that it perceives the data from the tiny lens of the OO/ORM model. It fails to separate Data vs Process, and it treats data as a mere "persistence" slave of the objects. There are many other errors in that model, which I will not enumerate here, the point is, they start from the position of the requirement, and ignore the data.

    The second answer is, a project does not get commissioned until the requirement is set, the reality if funds are requirement-based. So the mature project leader makes sure that the requirement contains enough justification to analyse and model the data, as data, separate to the functions.

    If it depends on the requirements, shall we choose the simplest design at first and then make it more generic to accommodate new requirements?

    You could, but that will cost an awful lot. The mature sequence is to get the data model right, as early as possible.

    If the data model matches the real world, when changes and additions come up, it is easy to extend. Conversely, if the data model was the minimum for the functional requirement, or if it does not match the real world, then changes will be difficult and costly.

    Although the resulting data models share a lot of similarities, the initial design may influence the naming of the tables/columns, and the domains of the keys are different.

    Of course.

    If we choose to use one table for each type of things, we could choose incompatible keys for Atom and Molecule, such as atom weight for Atom and molecule name for Molecule.

    That would be a horrible error. Never place something in a container that does not match the label. Never place two different things in one container (which has one label). The correct method would be to use a common identifier Name (which is Atom- or Molecule- or Particle-name), and to use Subtypes.

    If we choose to use the generic approach, we may choose a common key for all particles.

    Only if there is one. And if there is not, that stands as a sign that the entities are not the same, that a generic model cannot be used.

    Changing Keys may have greater impact on the system, so it may not be easy to evolve from a simple design to a generic one.

    Well, the idea is to choose data items that are stable (not static) to form the key. And yes, Key design is an important aspect of the modelling exercise. If you follow the Relational Model, the Keys form the logical structure of the database. Domain is very important (I think you realise that). And yes, it is costly to change.

    Which brings us back to the main point. That is precisely the reason why the Keys have to be modelled and chosen correctly, for each table, as well as for all its children.

    Update 1 & 2

    I noticed your two Updates just now. This is not a full response, just a very short one for now.

    • Up to now, I understood Particle to be the set of Atoms plus the set of Molecules. That is what I modelled in D Normalised. Both have a Name, a common Key. It is subtyped.

    But now, given your hierarchy diagrams, and sample data (thank you), I realise that what I thought you meant, and what you meant, are two different things. Consider the Updated TRD & Hierarchy:

    1. Your Particle is the set of Molecules plus the set of Atoms plus the set of subatomic particles.

      • That is incorrect

      • There is an hierarchy, yes, but thus far, it exists in the sequence of tables, not as an hierarchy within one table.

      • Stated otherwise, the two sets (Atoms, Molecules) are discrete, each has their own set of components, which are different. There is no set that includes everything (except the theoretical universal set).

      The updated Table Relation model is E Normalised • Update 2. The Subtypes have been removed, along with Particle. It supplies all the requirements stated in Update 2. Note the updated Predicates.

    2. Your hierarchy diagram is incorrect.

      • Your error, is that you have combined the hierarchy of Classifiers (the structure, the container) with the data (the instances of Classifiers; the content). You can't do that. You need two separate diagrams, one for the container, and a second for the content.

      • This is a typical error of the OO/ORM mindset. Failure to observe the Scientific Principle to separate Data vs Process. The exact same error detailed in my Response to Hidders, in the previous question. Results is complex objects, that never actually work.

      • So your hierarchy diagram is illegal, it is two completely different diagrams combined into one.

      F Hierarchy (Classification) depicts that, and only that.

      G Hierarchy (Sample Data) illustrates that, and only that.

      There is a difference in style between the way you depict hierarchies (Organisation Chart) and the way I depict them (Explorer). One ends up being very wide, the other is more compact. I think you can figure it out

    3. You had some clarity at the end of the previous question. The novel notion of Type in that poisonous book has got you completely confused. This problem, these issues, have nothing to do with Type.

    More words are called for, I will respond more fully as time permits.

    0 讨论(0)
  • 2021-02-10 13:36

    When cut down to its bare essence, the relational model of data has no more than exactly one "rule" : all information in the database must be represented as values of attributes in tuples in relations.

    All of your "alternatives" potentially comply to that rule, provided :
    - that each attribute has an associated data type,
    - and that each tuple in each relation in the database will always have a value for each of its attributes,
    - and that value is a value that is a member of the data type associated with that attribute.

    EDIT : you have also failed to provide any detail of what the precise nature is of the facts you want to make a record of in your system.

    EDIT 2 : first comment by Walter M. still applies. Your facts seem to state things at different levels (which in the typical use cases will be notably distinct) :

    "6. A Hydrogen atom is composed of one proton and one electron"

    After a small rewrite to eliminate the 'AND' therein :

    "6. A <atom_id> atom contains <qty> <subatomicparticletype>"

    This one looks like something that would go into your database (if your use case is as typical/mundane as could be supposed) :

    A 'H' atom contains 1 proton
    A 'H' atom contains 1 electron
    A 'H' atom contains 1 neutron

    (Note how eliminating the 'AND' involved splitting the conjunction into "atomic" parts (pun intended).)

    From this one, we can start wondering what to do about the <subatomicparticletype>. If your use case is such that the existence of proton/neutron/electron is just a given, and it will never change, then you can simply use a data type for it, and modeling it will not involve more than identifying the type so that your model's readers will know the intended value set. If, however your use case is such that, say, you are experimenting to try and find a completely new model of chemistry, in which there could also be foobarons alongside protons [or their existence could be removed again for the sake of experimentation], then you'd have to include a table that says "<subatomicparticletype> is part of my model of atoms".

    Furthermore, you'd then also have to include a rule in your model that any <subatomicparticletype> that is claimed to be part of an atom, must indeed be one that is part of your model of atoms. In SQLspeak : you'd need an FK from your ATOM_CONTAINS_PARTICLE table to this EXISTING_PARTICLES table.

    In a sense, the declaration of this rule is like your

    "2. Atoms are composed of protons, neutrons, and electrons."

    But note that you won't have a table in your own database that says this kind of thing. Instead, by declaring the FK to the system, this particular statement will be made in the catalog.

    You need to make proper distinction between the type of statements that directly state things that are within the domain of interest (those end up being entities/classes/... in your conceptual models and most likely tables in your database) and the type of statements that state things about the domain of interest (like your FK rule).

    (In use cases where the domain itself is highly abstract, the line between the two may be extremely thin or even nonexistant.)

    0 讨论(0)
  • 2021-02-10 13:38

    I like Fowler's treatment of Class Table Inheritance and Single Table Inheritance. You have touched on both of these designs here. Each of them has its uses and its drawbacks. You can use these as search terms and you'll get a lot to read. Some of it is worthwhile. There are even a couple of tags here in SO with those names.

    I'm not sure about today, but subtypes were often omitted from Database 101 courses back some 20 years ago, and it was something that every database builder ran into as soon as they got into "the real world".

    0 讨论(0)
提交回复
热议问题