Identifying Transitive Dependencies

泄露秘密 提交于 2019-11-29 12:52:57

From your question it seems that you do not have a clear understanding of basics.

Application relationships and situations

First you have to take what you were told about your application (including a priori rules) and identify the application relationships. Each gets a base table (aka relation). Such an application relationship is characterized by a row membership criterion (aka predicate) (aka meaning). Eg suppose criterion "student [student_id] takes course [course_title]" has table TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(student_id,course_title). A criterion plus a row makes a statement (proposition). Eg row (17,'CS101') gives "student 17 takes course 'CS101'" ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.

If we can split a criterion into two that are ANDed together then we only need the tables with the new criteria. This is because JOIN is defined so that the JOIN of two tables containing the rows making their criteria true returns the rows that make the AND of their criteria true. So we can JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)

-- student with id [si] has name [sn] and address [sa] and major [sm]
    and takes course [ci] with title [ct]
        from instructor with id [ii] and name [in] and office [io]
        with grade [scg]
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)

-- student with id [si] has name [sn] and address [sa] and major [sm]
    and takes course [ci] with grade [scg]
SG(si,sn,sa,sm,ci,scg)

--  course [ci] with title [ct]
        is taught by instructor with id [ii] and name [in] and office [io]
CI(ci,ct,ii,in,io,scg)

-- T(si,sn,sa,sm,ci,ct,ii,in,io,scg) IFF
    SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg)
-- T = SG JOIN CI

Together the application relationships and situations determine both the rules and FDs (and other constraints)! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.) Then we normalize to reduce redundancy.

The only time a rule can tell you something you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!

(Sadly many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship and tale using column sets that identify entities.)

(Check out Object-Role Modeling or Nijssen's presentations of his NIAM.)

FDs, CKs and normalization

Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in that table.

For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.

Only after you have determined the FDs can you determine the candidate keys! A CK is a superkey that constains no smaller superkey. (The presence of CKs and superkeys are also constraints.) We can pick a CK as primary key.

A partial dependency relies on either one of the attributes from the Primary key.

Don't use "involve" or "relies on" to give a definition. Say, "when" or "if and only if".

Read a definition. A FD is partial if and only if using a proper subset of the determinant gives a FD with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.

A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute (from my PK).

Read a definition. S -> T is transitive when there is an X where S -> X and X -> T and not(X -> S). Note that this does not involve CKs. A relation is in 3NF when it is in 2NF and all non-prime attributes are non-transitively dependent on every CK.

I am inferring a functional dependency that was not listed in your business rules. Namely that instructor ID determines instructor name.

If this is true, and if you have both instructor ID and instructor name in the Course table, then this is not in 3NF, because there is a transitive dependency between Course ID, Instructor ID, and Instructor Name.

Why is this harmful? Because duplicating the instructor name in each course an instructor teaches makes updating an instructor name difficult, and possible to do in an inconsistent manner. Inconsistent instructor name is just another bug you have to watch out for, and 3NF obviates the problem. The same argument could be made for Instructor office.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!