4. Portions from Some Chapters


Normalization : Overview


Third Normal Form (3NF)

Non-PK : Non-PK Relationship (PK=Primary Key)

There should be no dependency among non-key attributes.

Non-key attributes should depend only on primary key(PK) and on no other column. That is, every nonkey attribute should be  non-transitively dependent on the primary key.

Third normal form is violated when a non-key column is a fact about another non-key column.

A table is in third normal form if the nonkey attributes that do not contribute to a description of the primary key are move into a separate table, like, for example, look-up tables.

To be in 3NF, a relation must first be in 2NF (which implies 1NF too), and additionally, all attributes that are not dependent upon the primary key must be eliminated.

In other words, a record is in second and third normal forms if every column is either (a)part of the primary key, or (b)provides a fact about the whole primary-key (and not its part, if it is a composite key).

Example:

Before Normalization

After Normalization

Rule

Key  

pNo    brCode    brCatg

p123    2134        R

r231    2001        M

k999    2134        R

Table-1

pNo   brCode

p123  2134

r231  2001

k999  2134

 

Table-2

brCode  brCatg

2134      R

2001      M

No dependency between non-key columns (brCode and brCatg) - leads to redundancy.

Split the table.

 

As seen from the example above, 3 NF practically means:

§ If a non-key column is dependent upon another non-key column, it logically belongs not to the table it is in, but to a separate table. Hence, spin it off into a separate table.

§ Relate the split table with a foreign key.


Fourth Normal Form (4NF)

Fourth Normal Form (4NF) deals with multi-valued attributes. 2NF, 3NF, BCNF are concerned with functional dependencies, while 4NF and 5 NF are concerned with multi-valued dependency.

4NF disallows more than one independent multi-valued column. That is, there can NOT be more than one (a)independent (b)multi-valued column in a table. To be in Fourth Normal Form, a relation must first be in Boyce-Codd Normal Form. Additionally, a given relation may not contain more than one multi-valued dependency.

Examples:

Before Normalization

After Normalization

Rule

 

pNo    lang       clang

p12    English   C

p12    Hindi      Pascal

p12    French   Cobol

p12    null       Basic

p12    null       Fortran

 

clang not dependent upon lang

Table-1               

pNo    lang           

p12    English        

p12    Hindi          

p12    French

 

Table-2

pNo    clang

p12    C

p12    Pascal

p12    Cobol

p12    Basic

p12    Fortran

 

 

No 2 or more columns with independent multi-valued attributes (like lang and clang here).

 

Split the table.

 

Avoids redundant nulls.

 

Before Normalization

After Normalization

 *Subject teacher is proficient in.

Teacher_ID

*Subj_ID

Class_ID

T1

S1

C1

T1

S2

C2

T1

S3

 

T2

S1

C3

T2

 

C4

T2

 

C5

 Table_A

Teacher_ID

Subj_ID

T1

S1

T1

S2

T1

S3

T2

S1

 

 

 

 Table_B

Teacher_ID

Class_ID

T1

C1

T1

C2

T2

C3

T2

C4

T2

C5

 In the example above, Subj_ID, the subjects the concerned teachers are proficient in, and Class_ID, the classes to which they are assigned, are each multi-valued attributes, and Subj_ID is independent of Class_ID. Hence, the two attributes ought not to be in the same table, and must be split into two tables, Tabe_A and Table_B, as shown above.


 

Union, Intersect, Minus

Union, Intersect and Minus are set operators that operate between two sets, that is, between result sets of two SQLs, and determine what would be the combined result set of the two.

The number of columns/expressions in the SELECT of the component queries must match, and they should belong same datatype group (char, numeric, ...).

Diagrammatically: