2. Introduction
• E-R modeling is used as a requirement
analysis tool.
• DB design consists of inconsistency,
ambiguity and redundancy.
• The refinement process of DB design –
Normalization.
3. Issues – DB Design
• Making relations very large.
• Difficult to maintain and update data as it would
involve searching many records in relation.
• Poor utilization of disk space and resources.
• Errors and inconsistencies increases.
4. Normalization
• Normalization is the process of organizing the
data in the database.
• Normalization is used to minimize the
redundancy from a relation or set of relations.
• Eliminate Insertion, Update, and Deletion
Anomalies.
• Normalization divides the larger table into
smaller and links them using relationships.
• The normal form is used to reduce redundancy
from the database table.
7. Determinant
• Attribute X can be defined as
determinant if it uniquely
determines the attribute Y in a
given relationship.
• Attribute need not be a key
attribute.
8. Functional Dependency
REPORT(Student#course# ,Cname, Room #,Marks, Grade)
• Student#course# : composite attributes
exactly determines one value of marks.
• Marks is functionally dependent on
student#course#
• Example:
• X->Y
9. Full Functional Dependency
• Marks is fully functional dependent on
student# and course#
• Cannot determine marks secured by a
student in a course if only one is known.
• Cname is not fully functionally dependent
on student#course#.
10. Partial dependency
• X and Y are attribute sets. Attribute sets Y
are partially dependent on attribute set X.
• Course# alone is enough to determine
courseName, Iname
and Room#.
12. Trivial functional dependency means that
the right-handside is a subset ( not
necessarily a proper subset) of the left-
hand side.
Trival functional dependency
For example: staffNo, sName sName
🡪
staffNo, sName staffNo
🡪
They do not provide any additional information about
possible integrity constraints on the values held by these
attributes.
We are normally more interested in nontrivial
dependencies because they represent integrity
13. FD – Example
Database to track reviews of papers submitted to an academic
conference. Prospective authors submit papers for review and
possible acceptance in the published conference proceedings.
Details of the entities
• Author information includes a unique author number, a name, a
mailing address, and a unique (optional) email address.
• Paper information includes the primary author, the paper
number, the title, the abstract, and review status (pending,
accepted,rejected)
• Reviewer information includes the reviewer number, the name,
the mailing address, and a unique (optional) email address
• A completed review includes the reviewer number, the date, the
paper number, comments to the authors, comments to the
program chairperson, and ratings (overall, originality,
15. • Levels of normalization based on the amount of
redundancy in the database.
• Various levels of normalization are:
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
• Domain Key Normal Form (DKNF)
Levels of Normalization
Redundancy
Number
of
Tables
Most databases should be 3NF or BCNF in order to avoid the database anomalies.
Complexity
16. A table is considered to be in 1NF if all the fields contain
only scalar values (as opposed to list of values).
Example (Not 1NF)
First Normal Form (1NF)
Author and AuPhone columns are not scalar
0-321-32132-1 Balloon Sleepy,
Snoopy,
Grumpy
321-321-1111,
232-234-1234,
665-235-6532
Small House 714-000-0000 $34.00
0-55-123456-9 Main Street Jones,
Smith
123-333-3333,
654-223-3455
Small House 714-000-0000 $22.95
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00
1-22-233700-0 Visual
Basic
Roman 444-444-4444 Big House 123-456-7890 $25.00
ISBN Title AuName AuPhone PubName PubPhone Price
17. 1. Place all items that appear in the repeating group in a new
table
2. Designate a primary key for each new table produced.
3. Duplicate in the new table the primary key of the table from
which the repeating group was extracted or vice versa.
Example (1NF)
1NF - Decomposition
0-321-32132-1 Balloon Small House 714-000-0000 $34.00
0-55-123456-9 Main Street Small House 714-000-0000 $22.95
0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00
1-22-233700-0 Visual
Basic
Big House 123-456-7890 $25.00
ISBN Title PubName PubPhone Price
ISBN AuName AuPhone
0-123-45678-0 Joyce 666-666-6666
1-22-233700-0 Roman 444-444-4444
0-55-123456-9 Smith 654-223-3455
0-55-123456-9 Jones 123-333-3333
0-321-32132-1 Grumpy 665-235-6532
0-321-32132-1 Snoopy 232-234-1234
0-321-32132-1 Sleepy 321-321-1111
18. For a table to be in 2NF, there are two
requirements
• The database is in first normal form
• No partial dependency between key and
non-key attributes.
Note: Remember that we are dealing with
non-key attributes
Second Normal Form (2NF)
22. Example 1 (Not 2NF)
Scheme 🡪 {Title, PubId, AuId, Price,
AuAddress}
1. Key 🡪 {Title, PubId, AuId}
2. {Title, PubId, AuID} {Price}
🡪
3. {AuID} {AuAddress}
🡪
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId
which is a subset of a key
23. Example 2 (Not 2NF)
Scheme 🡪 {City, Street, HouseNumber, HouseColor,
CityPopulation}
1. key 🡪 {City, Street, HouseNumber}
2. {City, Street, HouseNumber} {HouseColor}
🡪
3. {City} {CityPopulation}
🡪
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the
City which is a proper subset of the key
Second Normal Form (2NF)
24. 1. If a data item is fully functionally dependent on
only a part of the primary key, move that data
item and that part of the primary key to a new
table.
2. If other data items are functionally dependent on
the same part of the key, place them in the new
table also
3. Make the partial primary key copied from the
original table the primary key for the new table.
Place all items that appear in the repeating group
in a new table
2NF - Decomposition
25. Example 1 (Convert to 2NF)
Old Scheme 🡪 {Title, PubId, AuId, Price, AuAddress}
New Scheme 🡪 {Title, PubId, AuId, Price}
New Scheme 🡪 {AuId, AuAddress}
Example 2(Convert to 2NF)
Old Scheme 🡪 {City, Street, HouseNumber, HouseColor,
CityPopulation}
New Scheme 🡪 {City, Street, HouseNumber, HouseColor}
New Scheme 🡪 {City, CityPopulation}
2NF - Decomposition
26. This form dictates that all non-key attributes of
a table must be functionally dependent on a
candidate key i.e. there can be no
interdependencies among non-key
attributes.
For a table to be in 3NF, there are two
requirements
• The table should be second normal form
• No attribute is transitively dependent on the
Third Normal Form
(3NF)
27. 1. Move all items involved in
transitive dependencies to a new
entity.
2. Identify a primary key for the
new entity.
3. Place the primary key for the
new entity as a foreign key on
the original entity.
3NF - Decomposition
28. Example 1 (Not in 3NF)
Scheme {
🡪 Studio, StudioCity, CityTemp}
• Primary Key {Studio}
🡪
• {Studio} {StudioCity}
🡪
• {StudioCity} {CityTemp}
🡪
• {Studio} {CityTemp}
🡪
• Both StudioCity and CityTemp
depends on the entire key hence 2NF
• CityTemp transitively depends on Studio hence violates 3NF
Example 2 (Not in 3NF)
Scheme {BuildingID, Contractor, Fee}
🡪
1. Primary Key {BuildingID}
🡪
2. {BuildingID} {Contractor}
🡪
3. {Contractor} {Fee}
🡪
4. {BuildingID} {Fee}
🡪
5. Fee transitively depends on the BuildingID
6. Both Contractor and Fee depend on the entire key hence 2NF
Third Normal Form
(3NF)
Buildin
gID
Contract
or
Fe
e
100 Rando
lph
12
00
150 Ingers
oll
110
0
200 Rando
lph
12
00
250 Pitkin 110
0
300 Rando
lph
12
00
29. Example 1(Convert to 3NF)
Old Scheme 🡪 {Studio, StudioCity, CityTemp}
New Scheme 🡪 {Studio, StudioCity}
New Scheme 🡪 {StudioCity, CityTemp}
Example 2 (Convert to 3NF)
Old Scheme 🡪 {BuildingID, Contractor, Fee}
New Scheme 🡪 {BuildingID, Contractor}
New Scheme 🡪 {Contractor, Fee}
3NF - Decomposition
31. BCNF
•Boyce Codd normal form (BCNF)
•It is an advance version of 3NF that’s
why it is also referred as 3.5NF.
•A table complies with BCNF if it is in
3NF and for every non trivial
functional dependency of the form
X->Y, X should be the super key of the
table.
35. Conditions- Multivalued
dependency
• For a dependency A B, if for a single value
→
of A, multiple value of B exists, then the table
may have multi-valued dependency.
• Also, a table should have at-least 3 columns
for it to have a multi-valued dependency.
• And, for a relation R(A,B,C), if there is a
multi-valued dependency between, A and B,
then B and C should be independent of each
other.
40. 5NF
• R is already in 4NF and not having any
join dependency and joining should be
lossless.
• R decomposed into R1,R2 and R3.
• (R1 R2) R3 or R1 (R2 R3)
⋈ ⋈ ⋈ ⋈
should be able to reconstruct the
original table R.
44. Properties of Functional
Dependency
• Let X, Y, and Z are sets of attributes in a relation R.
There are several properties of functional
dependencies which always hold in R also known as
Armstrong Axioms.
• Reflexivity: If Y is a subset of X, then X → Y. e.g.;
Let X represents {E-ID, E-NAME} and Y represents
{E-ID}. {E-ID, E-NAME}->E-ID is true for the relation.
45. Properties of Functional
Dependency
• Augmentation: If X → Y, then XZ → YZ. e.g.; Let X
represents {E-ID}, Y represents {E-NAME} and Z
represents {E-CITY}.
• As {E-ID}->E-NAME is true for the relation, so
{E-ID,E-CITY}->{E-NAME,E-CITY} will also be true.
46. Properties of Functional
Dependency
• Transitivity: If X → Y and Y → Z, then X → Z. e.g.;
Let X represents {E-ID}, Y represents {E-CITY} and Z
represents {E-STATE}.
As {E-ID} ->{E-CITY} and {E-CITY}->{E-STATE} is true
for the relation, so { E-ID }->{E-STATE} will also be true.
• Attribute Closure: The set of attributes that are
functionally dependent on the attribute A is called
Attribute Closure of A and it can be represented as A+
.
47. Closure of a Set of Functional
Dependencies
◼ Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
▪ For example: If A → B and B → C, then we can infer that A C
→
◼ The set of all functional dependencies logically implied by F is the
closure of F.
◼ We denote the closure of F by F+
.
◼ We can find all of F+
by applying Armstrong’s Axioms:
▪ if β α, then α
⊆ → β (reflexivity)
▪ if α → β, then γ α γ
→ β (augmentation)
▪ if α → β, and β γ, then α γ (transitivity)
→ →
◼ These rules are
▪ sound (generate only functional dependencies that actually hold) and
▪ complete (generate all functional dependencies that hold).
53. Example
◼ R = (A, B, C, G, H, I)
F = { A → B
A → C
CG → H
CG → I
B → H}
◼ some members of F+
▪ A → H
▪ by transitivity from A → B and B → H
▪ AG → I
▪ by augmenting A → C with G, to get AG → CG
and then transitivity with CG → I
▪ CG → HI
▪ by applying union CG → H and CG → I
54. Procedure for Computing F+
◼ To compute the closure of a set of functional dependencies F:
F +
= F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F +
does not change any further
NOTE: We shall see an alternative procedure for this task later
55. Closure of Functional
Dependencies (Cont.)
◼ We can further simplify manual computation of F+
by
using the following additional rules.
▪ If α → β holds and α γ holds, then α
→ → β γ holds
(union)
▪ If α → β γ holds, then α → β holds and α γ holds
→
(decomposition)
▪ If X → Y and WY → Z hold, then WX → Z also
holds. This is a special case of transitivity
(Pseudotransitivity)
▪ The above rules can be inferred from Armstrong’s
axioms.
56. Closure of Attribute Sets
◼ Given a set of attributes α, define the closure of α under
F (denoted by α+
) as the set of attributes that are
functionally determined by α under F
◼ Algorithm to compute α+
, the closure of α under F
result := α;
while (changes to result) do
for each β γ
→ in F do
begin
if β ⊆ result then result := result γ
∪
end
57. CANONICAL COVER
• A canonical cover of a set of
functional dependencies F is a
simplified set of functional
dependencies that has the same
closure as the original set F.
58. Extraneous attributes
•Extraneous attributes:
An attribute of a functional
dependency is said to be extraneous
if we can remove it without changing
the closure of the set of functional
dependencies
59. Conditions
• Canonical cover: A canonical cover of a set of
functional dependencies F such that ALL the following
properties are satisfied:
• F logically implies all dependencies in Fclosure .
• Fclosure logically implies all dependencies in F.
• No functional dependency in contains an extraneous
attribute.
• Each left side of a functional dependency in is unique.
That is, there are no two dependencies and in such
that .
60. Algorithm
• repeat
• 1. Use the union rule to replace any
dependencies in α1 β
→ 1 and α1 β
→ 2 with
α1 β
→ 1 β2
• 2. Find a functional dependency with an
extraneous attribute either in or in .
• 3. If an extraneous attribute is found,
delete it from . until F does not change
61. Example
• Consider the following set F of
functional dependencies:
• F= {
A ->BC
B->C
A->B
AB->C
}
62. Step 1
• There are two functional dependencies with the same
set of attributes on the left:
A->BC
A->B
• These two can be combined to get
A->BC.
• Now, the revised set F becomes:
• F= {
A->BC
B->C
AB->C
}
63. Step 2
• There is an extraneous attribute in AB ->C
because even after removing AB ->C from
the set F, we get the same closures. This is
because B->C is already a part of F. Now, the
revised set F becomes:
• F= {
A ->BC
B ->C
}
64. Step 3
• C is an extraneous attribute in A-> BC,
also A ->B is logically implied by A->B
and B->C (by transitivity).
• F= {
A ->B
B ->C
}
65. Step 4
• After this step, F does not change
anymore. So,
Hence the required canonical cover is,
= {
A ->B
B ->C
}
Editor's Notes
#16:Having scalar values also means that all instances of a record type must contain the same number of fields.
A table not in first normal form is called un normalized
#17:1. The designated key will be the primary key of the original table concatenated with one or more data items from the new table.
For the first table the primary key is ISBN
For the second table the primary key is ISBN + Author Name
#23:Let us consider the problems with the movie studio database:
Redundancy – City Population is repeated many times
Insertion anomaly – Whenever we add a new record we have to add unnecessary information. We can not add record until we know information about the city population
Deletion anomaly – Whenever we delete a record, useful information is deleted.
Update anomaly – The City Population needs to be updated in more than one location if it changes.
#24:If there is a table with columns A,B,C,D with Primary Key (A,B) & D is dependant on A (alone) then to be 2NF, you should reduce (split) tables as:
Table with columns A,D with Primary Key (A)
Table with columns A,B,C with Primary Key (A,B)
#26:In the Book Schema Third Normal Form is violated since a non-key field is dependent on another non-key field and is transitively dependent on the primary key.
#27:If there is a table with columns A,B,C with Primary Key (A) and C is dependant on B (B 🡪 C) then to be 3NF, the tables become
Table with columns B,C with Primary Key (B)
Table with fields A,B with Primary Key ( A), and Foreign Key (B)
#28:Let us consider the problems with the movie studio database:
Redundancy – City Population is repeated many times
Insertion anomaly – Whenever we add a new record we have to add unnecessary information. We can not add record until we know information about the city population
Deletion anomaly – Whenever we delete a record, useful information is deleted.
Update anomaly – The City Population needs to be updated in more than one location if it changes.