NORMALIZATION in database management systems

Relational Database Design
Relational database design: The grouping of attributes to
form "good" relation schemas
Two levels of relation schemas:
The logical "user view" level
The storage "base relation" level
Criteria for "good" base relations:
•Discuss informal guidelines for good relational design
•Discuss formal concepts of functional dependencies
and normal forms 1NF 2NF 3NF BCNF

There are two popular approaches for designing the db
There are two popular approaches for designing the db
. Top down design
. Top down design
. Bottom up design
. Bottom up design
ER modeling technique is called Top down approach
ER modeling technique is called Top down approach
it involves
it involves
i)
i) Identifying entities and their attributes
Identifying entities and their attributes
ii)
ii) Identifying the relationship between entities
Identifying the relationship between entities
iii)
iii)Draw the ER diagram
Draw the ER diagram
iv)
iv)Mapping diagrams to the tables
Mapping diagrams to the tables

[
[
Normalization is the bottom up approach. It is step by
Normalization is the bottom up approach. It is step by
step decomposition of complex records into simple records.
step decomposition of complex records into simple records.
Normalization controls the redundancy and removes
Normalization controls the redundancy and removes
inconsistency and update anomalies
inconsistency and update anomalies
Normalization is based on the functional dependency
Normalization is based on the functional dependency
and primary key
and primary key
Normalization: The process of decomposing unsatisfactory
"bad" relations by breaking up their attributes into smaller
relations
Normal form: Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular normal
form

Informal design guidelines for relation schemas
Informal design guidelines for relation schemas
1) Semantics of the relation attributes
1) Semantics of the relation attributes
2) Reducing the redundant values in tuples
2) Reducing the redundant values in tuples
3) Reducing the null values in tuples
3) Reducing the null values in tuples
4) Disallowing the possibility of generating spurious
4) Disallowing the possibility of generating spurious
tuples
tuples

Semantics of the Relation Attributes
GUIDELINE 1:
GUIDELINE 1: Informally, each tuple in a relation
Informally, each tuple in a relation
should represent one entity or relationship instance.
should represent one entity or relationship instance.
 Attributes of different entities (EMPLOYEEs,
Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be
DEPARTMENTs, PROJECTs) should not be
mixed in the same relation
mixed in the same relation
 Only foreign keys should be used to refer to other
Only foreign keys should be used to refer to other
entities
entities
 Entity and relationship attributes should be kept
Entity and relationship attributes should be kept
apart as much as possible.
apart as much as possible.

Redundant Information in Tuples and Update
Anomalies
GUIDELINE 2:
GUIDELINE 2:
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly wasting storage
Problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies

Insert Anomaly: Cannot insert a project unless an
employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project.
Delete Anomaly: When a project is deleted, it will
result in deleting all the employees who work on that
project. Alternately, if an employee is the sole
employee on a project, deleting that employee would
result in deleting the corresponding project

 Update Anomaly:
Update Anomaly: Changing the name of project
Changing the name of project
number P1 from “Billing” to “Customer-
number P1 from “Billing” to “Customer-
Accounting” may cause this update to be made for
Accounting” may cause this update to be made for
all 100 employees working on project P1.
all 100 employees working on project P1.
 GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and update
anomalies. If there are any present, then note them
so that applications can be made to take them into
account

If a database design is not perfect, it may contain anomalies, which
are like a bad dream for any database administrator. Managing a
database with anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked
to each other properly, then it could lead to strange situations. For
example, when we try to update one data item having its copies
scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the
database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it
was left undeleted because of unawareness, the data is also saved
somewhere else.
Insert anomalies − We tried to insert data in a record that does not
exist at all.

Null Values in Tuples
GUIDELINE 3: Relations should be designed such
that their tuples will have as few NULL values as
possible
Reasons for nulls:
attribute not applicable or invalid
attribute value unknown (may exist)
value known to exist, but unavailable

Spurious Tuples
GUIDELINE 4: The relations should be designed to
satisfy the lossless join condition. No spurious tuples
should be generated by doing a natural-join of any
relations.
There are two important properties of decompositions:
(a)non-additive or losslessness of the corresponding join
(b)preservation of the functional dependencies.

Functional dependency
Functional dependency
it’s a constraint between two set of attributes
it’s a constraint between two set of attributes
from the db.
from the db.
A F.D denoted by X-> Y between two sets of
A F.D denoted by X-> Y between two sets of
attributes x and y that are subsets of R specifies a
attributes x and y that are subsets of R specifies a
constraint on the possible tuples that can form a
constraint on the possible tuples that can form a
relation state r of R
relation state r of R
The constraint is that, for any two tuples t1 & t2 in r
The constraint is that, for any two tuples t1 & t2 in r
t1[x]=t2[x]
t1[y]=t2[y]

R(X,Y)
R(X,Y)
X
X Y
Y
t
t1
1 10
10 d1
d1
t
t2
2 10
10 d1
d1
There is a FD from X to Y or Y is FD on X
There is a FD from X to Y or Y is FD on X
FD=> Functional dependency or f.d
FD=> Functional dependency or f.d
X=> L.H.S
X=> L.H.S
Y=> R.H.S
Y=> R.H.S

Full Functional dependency
Partial Functional dependency
Transitive dependency
e.g., Eno,Pno-> Hours
e.g., Eno,Pno-> Hours
e.g., Eno,Pno->Ename
e.g., Eno,Pno->Ename
e.g., Eno->Dno
e.g., Eno->Dno
Dno->Dname
Dno->Dname Eno->Dname
Eno->Dname

Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X -> Y
IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
Some additional inference rules that are useful:
IR4. (Decomposition) If X -> YZ, then X -> Y and X -> Z
IR5. (Union) If X -> Y and X -> Z, then X -> YZ
IR6. (Psuedotransitivity) If X -> Y and WY -> Z, then
WX -> Z

Trivial , Non trivial
Trivial − If a functional dependency (FD) X → Y
holds, where Y is a subset of X, then it is called a
trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a
subset of X, then it is called a non-trivial FD.
Completely non-trivial − If an FD X → Y holds,
where x intersect Y = Φ, it is said to be a completely
non-trivial FD.

Candidate key
If a relation schema has more than one key, each is called
a candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys.
Prime and Non prime attribute
A Prime attribute must be a member of some candidate
key
A Nonprime attribute is not a prime attribute—that is, it is
not a member of any candidate key.

Normalization of data is a process of analyzing the given
Normalization of data is a process of analyzing the given
relation schemas based on their FD & primary keys to
relation schemas based on their FD & primary keys to
achieve the desirable properties
achieve the desirable properties
1)
1) Minimizing redundancy
Minimizing redundancy
2)
2) Minimizing the insertion , deletion and modification
Minimizing the insertion , deletion and modification
anomalies
anomalies
Normal forms
Normal forms
1NF, 2NF, 3NF, BCNF(Boyce Codd Normal Form),4NF
1NF, 2NF, 3NF, BCNF(Boyce Codd Normal Form),4NF
and 5NF
and 5NF

1NF
1NF- is based on primary key and atomic values and there
- is based on primary key and atomic values and there
must be no composite attributes, multivalued attributes and
must be no composite attributes, multivalued attributes and
relation with in relation.
relation with in relation.
Composite attribute
Composite attribute
Eno
Eno Address
Address Ename
Ename
Fname
Fname Lname
Lname
Eno
Eno Address
Address Fname
Fname Lname
Lname

Multivalued Attribute
Multivalued Attribute
Dno
Dno Dname
Dname Dlocation
Dlocation
Dno
Dno Dname
Dname Dno Dlocation
Dno Dlocation
Eno
Eno Ename
Ename Addr
Addr Pno
Pno Pname
Pname
Relation with in Relation
Relation with in Relation
Eno
Eno Ename
Ename Eno
Eno Pno
Pno Pname
Pname
Multivalued
Multivalued
Attribute
Attribute

2NF - There is no partial dependency.
2NF - There is no partial dependency.
It is based on the concept of full functional dependency and
It is based on the concept of full functional dependency and
non key attribute should be fully dependent on the key
non key attribute should be fully dependent on the key
attribute.
attribute.
A F.D X->Y if fully F.D
A F.D X->Y if fully F.D
Def: A rs R is in 2NF if every non prime attribute A in R is
Def: A rs R is in 2NF if every non prime attribute A in R is
full FD on the primary key of R
full FD on the primary key of R

Eg.
Eg.
R={
R={eno, pno
eno, pno, hours, ename, pname, plocation}
, hours, ename, pname, plocation}
Given functional dependency
FD = {{eno,pno}-> hours,
FD = {{eno,pno}-> hours,
eno->ename
eno->ename
pno->pname, plocation}
pno->pname, plocation}
R1={
R1={eno,pno
eno,pno,hours}
,hours}
R2 = {
R2 = {eno
eno,ename}
,ename}
R3={
R3={pno
pno,pname,plocation}
,pname,plocation}
now all the relations R1, R2 and R3 are in full functional
now all the relations R1, R2 and R3 are in full functional
dependency.
dependency.

3NF-
3NF-
It is based on the concept of transitive dependency
It is based on the concept of transitive dependency
Def: A rs R is in 3NF if it satisfies 2 NF and no non
Def: A rs R is in 3NF if it satisfies 2 NF and no non
prime attribute of R is transitively dependent on the
prime attribute of R is transitively dependent on the
primary key
primary key
Def: A rs R is in 3NF if, When ever a non trivial FD
Def: A rs R is in 3NF if, When ever a non trivial FD
X-> A holds in R, either
X-> A holds in R, either
a)
a) X is a super key of R (or)
X is a super key of R (or)
b)
b) A is a prime attribute of R
A is a prime attribute of R

Eg:
Eg:
R={
R={eno
eno, ename, address,dno,dname}
, ename, address,dno,dname}
F = {eno -> ename,address,dno
F = {eno -> ename,address,dno
dno -> dname}
dno -> dname}
R1={
R1={eno
eno,ename,address,dno} R2 = {
,ename,address,dno} R2 = {dno
dno,dname}
,dname}

BCNF(Boyce codd Normal Form)
BCNF(Boyce codd Normal Form)
Def:
Def:
A rs R is in BCNF if when ever a non trivial FD
A rs R is in BCNF if when ever a non trivial FD
X-> A holds in R, then X is a super key of R
X-> A holds in R, then X is a super key of R

Closure of a Set of Functional
Closure of a Set of Functional
Dependencies
Dependencies

Closure of a Set of Attributes
Closure of a Set of Attributes

Redundancy of FDs
Redundancy of FDs

Canonical Cover
Canonical Cover

Example of Computing a
Example of Computing a
Canonical Cover
Canonical Cover

3. Multivalued Dependencies and Fourth Normal
3. Multivalued Dependencies and Fourth Normal
Form (1)
Form (1)
(a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME. (b)
(a) The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME. (b)
Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.

Multivalued Dependencies and Fourth Normal
Form (2)
Form (2)
Definition:
Definition:
 A
A multivalued dependency
multivalued dependency (
(MVD
MVD)
) X
X —>>
—>> Y
Y specified on
specified on
relation schema
relation schema R
R, where
, where X
X and
and Y
Y are both subsets of
are both subsets of R
R,
,
specifies the following constraint on any relation state
specifies the following constraint on any relation state r
r of
of R
R: If
: If
two tuples
two tuples t
t1
1 and
and t
t2
2 exist in
exist in r
r such that
such that t
t1
1[
[X
X] =
] = t
t2
2[
[X
X], then two
], then two
tuples
tuples t
t3
3 and
and t
t4
4 should also exist in
should also exist in r
r with the following
with the following
properties, where we use
properties, where we use Z
Z to denote (
to denote (R
R 2
2 (
(X
X υ
υ Y
Y)):
)):
·
· t
t3
3[
[X
X] =
] = t
t4
4[
[X
X] =
] = t
t1
1[
[X
X] =
] = t
t2
2[
[X
X].
].
·
· t
t3
3[
[Y
Y] =
] = t
t1
1[
[Y
Y] and
] and t
t4
4[
[Y
Y] =
] = t
t2
2[
[Y
Y].
].
·
· t
t3
3[
[Z
Z] =
] = t
t2
2[
[Z
Z] and
] and t
t4
4[
[Z
Z] =
] = t
t1
1[
[Z
Z].
].
 An MVD
An MVD X
X —>>
—>> Y
Y in
in R
R is called a
is called a trivial MVD
trivial MVD if (a)
if (a) Y
Y is a
is a
subset of
subset of X
X, or (b)
, or (b) X
X υ
υ Y
Y =
= R
R.
.

Form (4)
Form (4)
Definition:
Definition:
 A relation schema
A relation schema R
R is in
is in 4NF
4NF with respect to a set of
with respect to a set of
dependencies
dependencies F
F (that includes functional dependencies
(that includes functional dependencies
and multivalued dependencies) if, for every
and multivalued dependencies) if, for every nontrivial
nontrivial
multivalued dependency
multivalued dependency X
X —>>
—>> Y
Y in
in F
F+
+
,
, X
X is a superkey
is a superkey
for R.
for R.

Form (5)
Form (5)
Decomposing a relation state of EMP that is not in 4NF. (a) EMP relation with additional tuples. (b)
Decomposing a relation state of EMP that is not in 4NF. (a) EMP relation with additional tuples. (b)
Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.

4. Join Dependencies and Fifth Normal Form (1)
4. Join Dependencies and Fifth Normal Form (1)
Definition:
Definition:
 A
A join dependency
join dependency (
(JD
JD), denoted by JD(
), denoted by JD(R
R1
1,
, R
R2
2, ...,
, ..., R
Rn
n),
),
specified on relation schema
specified on relation schema R
R, specifies a constraint on the
, specifies a constraint on the
states
states r
r of
of R
R. The constraint states that every legal state
. The constraint states that every legal state r
r of
of R
R
should have a non-additive join decomposition into
should have a non-additive join decomposition into R
R1
1,
, R
R2
2, ...,
, ...,
R
Rn
n; that is, for every such
; that is, for every such r
r we have
we have
* (
* (Π
ΠR1
R1(
(r
r),
), Π
Π R2
R2(
(r
r), ...,
), ..., Π
ΠRn
Rn(
(r
r)) =
)) = r
r
 A join dependency JD(
A join dependency JD(R
R1
1,
, R
R2
2, ...,
, ..., R
Rn
n), specified on relation
), specified on relation
schema
schema R
R, is a
, is a trivial JD
trivial JD if one of the relation schemas
if one of the relation schemas R
Ri
i in
in
JD(
JD(R
R1
1,
, R
R2
2, ...,
, ..., R
Rn
n) is equal to
) is equal to R
R.
.

Join Dependencies and Fifth Normal Form (2)
Join Dependencies and Fifth Normal Form (2)
Definition:
Definition:
 A relation schema
A relation schema R
R is in
is in fifth normal form
fifth normal form (
(5NF
5NF) (or
) (or
Project-Join Normal Form
Project-Join Normal Form (
(PJNF
PJNF)) with respect to a
)) with respect to a
set
set F
F of functional, multivalued, and join dependencies
of functional, multivalued, and join dependencies
if, for every nontrivial join dependency JD(
if, for every nontrivial join dependency JD(R
R1
1,
, R
R2
2, ...,
, ...,
R
Rn
n) in
) in F
F+
+
(that is, implied by
(that is, implied by F
F), every
), every R
Ri
i is a superkey
is a superkey
of
of R
R.
.

Relation SUPPLY with Join Dependency and
Relation SUPPLY with Join Dependency and
conversion to Fifth Normal Form
conversion to Fifth Normal Form
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d)
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d)
Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.
Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.

Steps to find Minimal Cover
•Singleton attributes in RHS
•Identify extraneous attributes and remove it
•Remove redundant dependencies
Singleton attributes in RHS
AB->CD
The above functional dependency should be
decomposed to singleton attributes in the RHS as below.
AB-> C and
AB-> D

Identify extraneous attributes and remove it
 If an attribute doesn’t give any meaning to the functional
If an attribute doesn’t give any meaning to the functional
dependency, we say it as extraneous and remove it
dependency, we say it as extraneous and remove it
Consider the functional dependencies
 A-> B
A-> B
 AB-> C
AB-> C
 D-> AC
D-> AC
 D-> E
D-> E
If the LHS has more than one attribute, check whether there exists an
extraneous( extra/unwanted) attribute if so, remove it.
LHS which have 2 attributes is AB-> C
A+
= ABC ,
B+
= B [Reflexivity]
If an attribute closure gives only its own attribute by satisfying
reflexivity, that attribute in the functional dependency is
extraneous.
B is extraneous in AB-> C implies A-> C

Finding Redundant Dependency
Finding Redundant Dependency
A-> B
A-> B
A-> C
A-> C
D-> AC
D-> AC
D-> E
D-> E
Step 1: Apply
Step 1: Apply
singleton to RHS
singleton to RHS
A-> B
A-> B
A-> C
A-> C
D-> A
D-> A
D-> C
D-> C
D-> E
D-> E
Step 2: In LHS there is no extraneous attribute
Step 2: In LHS there is no extraneous attribute
Step 3: Remove redundant dependencies
Step 3: Remove redundant dependencies
1.
1.
Remove A-> B and find the attribute closure for A
Remove A-> B and find the attribute closure for A
A+ =AC[here if we are not consider A-> B , B can’t be found in A+, so A-> B can’t be a redundant dependency.
A+ =AC[here if we are not consider A-> B , B can’t be found in A+, so A-> B can’t be a redundant dependency.
2. Remove A-> C and find the attribute closure for A
2. Remove A-> C and find the attribute closure for A
A+ =AB[here if we are not consider A-> C , C can’t be found in A+, so A-> C can’t be a redundant dependency.
A+ =AB[here if we are not consider A-> C , C can’t be found in A+, so A-> C can’t be a redundant dependency.
3. Remove D-> A and find the attribute closure for D
3. Remove D-> A and find the attribute closure for D
D+ =DCE[here if we are not consider D-> A , A can’t be found in D+, so D-> A can’t be a redundant dependency
D+ =DCE[here if we are not consider D-> A , A can’t be found in D+, so D-> A can’t be a redundant dependency
4. Remove D-> C and find the attribute closure for D
4. Remove D-> C and find the attribute closure for D
D+ =DAEC[here if we are not consider D-> C , C could be found in D+, so D-> C is the redundant dependency so
D+ =DAEC[here if we are not consider D-> C , C could be found in D+, so D-> C is the redundant dependency so
it should be removed. Then the FDs are A-> B, A-C , D->A, D->E
it should be removed. Then the FDs are A-> B, A-C , D->A, D->E
5. Remove D-> E and find the attribute closure for D
5. Remove D-> E and find the attribute closure for D
D+ =DABC [here if we are not consider D-> E , E can’t be found in D+, so D-> E can’t be a redundant
D+ =DABC [here if we are not consider D-> E , E can’t be found in D+, so D-> E can’t be a redundant
dependency
dependency

 So, Minimal cover will be after removing
So, Minimal cover will be after removing
a) Extraneous Attributes
a) Extraneous Attributes
b) Redundant Dependencies
b) Redundant Dependencies
 Minimal Functional Dependencies are
Minimal Functional Dependencies are
A-> B
A-> B
A-> C
A-> C
D-> A
D-> A
D->E
D->E

Find a Minimal Cover
Find a Minimal Cover
 R(A B C D E)
R(A B C D E)
 F ={ A->D,
F ={ A->D,
 BC-> AD,
BC-> AD,
 C->B,
C->B,
 E->A,
E->A,
 E->D}
E->D}
Steps:
Steps:
 Singleton attributes in RHS
Singleton attributes in RHS
 Identify extraneous attributes and remove it
 Remove redundant dependencies
Remove redundant dependencies

 Singleton attributes in
Singleton attributes in
RHS
RHS
 F={ A->D,
F={ A->D,
 BC->A,
BC->A,
 BC->D,
BC->D,
 C->B,
C->B,
 E->A,
E->A,
 E->D}
E->D}
R(A B C D E)
F ={ A->D,
BC-> AD,
C->B,
E->A,
E->D}

 F={ A->D,
F={ A->D,
 BC->A,
BC->A,
 BC->D,
BC->D,
 C->B,
C->B,
 E->A,
E->A,
 E->D}
E->D}
 F={ A->D,
F={ A->D,
 C->A,
C->A,
 C->D,
C->D,
 C->B,
C->B,
 E->A,
E->A,
 E->D}
E->D}

Remove redundant FDs
Remove redundant FDs
 F={ A->D,
F={ A->D,
 C->A,
C->A,
 C->D,
C->D,
 C->B,
C->B,
 E->A,
E->A,
 E->D}
E->D}
 F={ A->D,
F={ A->D,
 C->A,
C->A,
 C->B,
C->B,
 E->A,
E->A,
 }
}

Equivalence of sets of FDs
Equivalence of sets of FDs
 Two sets of FDs E and F
Two sets of FDs E and F
 F is said to cover E if every FD in E is also
F is said to cover E if every FD in E is also
in closure of F
in closure of F
 E and F are equivalent
E and F are equivalent
if E covers F and F covers E
if E covers F and F covers E
E
E+
+
= F
= F+
+

NORMALIZATION in database management systems

More Related Content

Similar to NORMALIZATION in database management systems (20)

More from SheebaS25 (10)

Recently uploaded (20)

NORMALIZATION in database management systems