Identifying Transitive Dependencies

So, I believe I have the understanding of Fully Functional Dependencies, and Partial Dependencies. I'll provide a brief explanation, in case I'm doing something wrong I don't end up too far down the rabbit hole.

I am working with a table that has a composite Primary Key composed of two attributes, with a total of 10 attributes in the table, in 1NF form.

In my situation, a fully functional dependency involves the dependent relying on BOTH attributes in my Primary Key. A partial dependency relies on either one of the attributes from the Primary key. A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute (from my PK).

Assuming that I'm not mistaken and regardless of my understanding, pulling the transitive dependencies out of the table is wracking my brain. It seems like you would do this AFTER normalization, but my assignment requires us to 'Identify all functional dependencies' before we draw the dependency diagram, after which we normalize the tables.

I will list the attributes in the table, and then the business rules provided - parenthesis identify the PK attributes:

(Student ID), Student Name, Student Address, Student Major, (Course ID), Course Title, Instructor ID, Instructor Name, Instructor Office, Student_crse_grade

Only one class is taught for each course ID. Students may take up to 4 courses. Each course may have a maximum of 25 students. Each course is taught by only one Instructor. Each student may have only one major.

From your question it seems that you do not have a clear understanding of basics.

Application relationships and situations

First you have to take what you were told about your application (including a priori rules) and identify the application relationships. Each gets a base table (aka relation). Such an application relationship is characterized by a row membership criterion (aka predicate) (aka meaning). Eg suppose criterion "student [student_id] takes course [course_title]" has table TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(student_id,course_title). A criterion plus a row makes a statement (proposition). Eg row (17,'CS101') gives "student 17 takes course 'CS101'" ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.

If we can split a criterion into two that are ANDed together then we only need the tables with the new criteria. This is because JOIN is defined so that the JOIN of two tables containing the rows making their criteria true returns the rows that make the AND of their criteria true. So we can JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)

-- student with id [si] has name [sn] and address [sa] and major [sm]
    and takes course [ci] with title [ct]
        from instructor with id [ii] and name [in] and office [io]
        with grade [scg]
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)

-- student with id [si] has name [sn] and address [sa] and major [sm]
    and takes course [ci] with grade [scg]
SG(si,sn,sa,sm,ci,scg)

--  course [ci] with title [ct]
        is taught by instructor with id [ii] and name [in] and office [io]
CI(ci,ct,ii,in,io,scg)

-- T(si,sn,sa,sm,ci,ct,ii,in,io,scg) IFF
    SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg)
-- T = SG JOIN CI

Together the application relationships and situations determine both the rules and FDs (and other constraints)! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.) Then we normalize to reduce redundancy.

The only time a rule can tell you something you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!

(Sadly many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship and tale using column sets that identify entities.)

(Check out Object-Role Modeling or Nijssen's presentations of his NIAM.)

FDs, CKs and normalization

Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in that table.

For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.

Only after you have determined the FDs can you determine the candidate keys! A CK is a superkey that constains no smaller superkey. (The presence of CKs and superkeys are also constraints.) We can pick a CK as primary key.

A partial dependency relies on either one of the attributes from the Primary key.

Don't use "involve" or "relies on" to give a definition. Say, "when" or "if and only if".

Read a definition. A FD is partial if and only if using a proper subset of the determinant gives a FD with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.

A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute (from my PK).

Read a definition. S -> T is transitive when there is an X where S -> X and X -> T and not(X -> S). Note that this does not involve CKs. A relation is in 3NF when it is in 2NF and all non-prime attributes are non-transitively dependent on every CK.

I am inferring a functional dependency that was not listed in your business rules. Namely that instructor ID determines instructor name.

If this is true, and if you have both instructor ID and instructor name in the Course table, then this is not in 3NF, because there is a transitive dependency between Course ID, Instructor ID, and Instructor Name.

Why is this harmful? Because duplicating the instructor name in each course an instructor teaches makes updating an instructor name difficult, and possible to do in an inconsistent manner. Inconsistent instructor name is just another bug you have to watch out for, and 3NF obviates the problem. The same argument could be made for Instructor office.

来源：https://stackoverflow.com/questions/27393366/identifying-transitive-dependencies

标签

sql

database

database-design

3nf