Business Information Systems Data Modeling - Normalisation Prithwis Mukerjee, Ph.D.
Normalization Pros: Ensures that each attribute belongs to the entity to which it is assigned Redundant storage of information is minimized Cons: Can adversely affect performance if rigorously implemented Can adversely affect deadlines if rigorously implemented NORMALIZATION A formal data modeling approach to examining and validating the model.
Foundations : Revisited Entity in the Data Model  The basic element about which we need to store and process data Order, Customer, Product For each entity there will be multiple instances  Multiple “orders” with the means to distinguish one specific “order” from another Actions can be performed on an entity C reate / Add R ead / Display U pdate / Modify D elete Object Oriented Terminology Class : equivalent to an entity Instances of an entity are instances of a class Methods : Actions that affect a class Most methods can be mapped down to variations of  C-R-U-D
Normal Forms Dr. E. F. Codd identified ‘normal forms’ as the different states of a ‘normalized relational’ data model. 1NF  = No repeating groups 2NF  = No partial key dependencies 3NF  = No non-key interdependencies 4NF = No independent multiple relationships 5NF = No semantically related multiple relationships
How to Normalize an Entity Before you normalize an entity, identify its  Primary Key Identify and resolve violations of  1NF  - make sure there are no repeating groups Identify and resolve violations of  2NF  - make sure that each non-key attribute depends on the entire key Identify and resolve violations of  3NF  - make sure that no non-key attribute depends on another non-key attribute
Order Management System Orders are the life blood for any commercial organisation Orders are received or recorded / created Reports are prepared on orders at hand. Orders are viewed Orders are modified or updated depending on the situation Orders are cancelled or archived when they are no more necessary
Identify the Primary Key ORDER ORDER
First Normal Form - 1NF Ask the following for each attribute: Does this attribute occur more than once for any given instance?  ( NO REPEATING GROUPS ) If yes,   build a new entity move all ‘repeating’ attributes to the new entity select or formulate attribute(s) for the new entity’s PK build an IDENTIFYING relationship FROM THE ORIGINAL entity TO THE NEW entity
1NF Violation & Solution Violation : Repeating Groups Solution : Split into two entities { { ORDER ORDER ORDER ITEM FK FK
Cleaner 1 NF Solution ORDER ORDER ITEM FK The new entity should be named to reflect its intention, given a primary key, and the inherited foreign key will be present as part of the primary key.
Second Normal Form - 2NF  For ONLY those entities that have a composite key, ask the following of each non-key attribute: Is this attribute dependent on  part  of the primary key?  ( NO PARTIAL KEY DEPENDENCIES )   If yes, build a new entity move all the attributes having the  same  partial key dependency to the new entity use the determinant attribute as the key or determine a better PK (move the PK attribute too) build and name an identifying relationship FROM NEW entity BACK TO ORIGINAL entity
2NF Violation and Solution ORDER ITEM ORDER ITEM PRODUCT Violation : Description, Unit Price does not depend on full PK key Solution : Split into two entities FK
Now the solution looks like  ... ORDER ORDER ITEM PRODUCT
Third Normal Form - 3NF  For each non-key attribute, ask the following:  Does this attribute depend on some other non-key attribute?  ( NO Non-Key INTERDEPENDENCIES ) If yes, build a new entity to contain all attributes with  same  non-key dependency use determinant attribute(s) as the PK build and name a non-identifying, non-mandatory relationship FROM NEW entity BACK TO ORIGINAL entity
3NF Violation & Solution ORDER ORDER CUSTOMER Violation : Address, Credit Limit does not depend on OrderID, but on Customer Name Solution : Create separate entity for Customer FK
Final Solution ORDER CUSTOMER ORDER ITEM PRODUCT FK FK FK ORDER UnNormalised Entity Normalised Entities Information about Customers and Products can be recorded  even when  there are no Orders
Entities can proliferate ! Order Management System Began with ORDER Ended with ORDER, ORDER-ITEM, PRODUCT, CUSTOMER Manufacturing System Might begin with PRODUCT End with PRODUCT, MATERIAL, MACHINE, DIMENSION ??? Marketing System Might begin with CUSTOMER End with ?? BUT : Entities should be unique  CUSTOMER must have same attributes whether defined in  Order Management System Marketing System PRODUCT must have same attributes whether defined in  Order Management System Manufacturing System This is why Data Modelling is so important
Rationale for Normalisation Data is easier to define  Data interdependencies are identified  Data ambiguities are resolved  Data model can be more flexible  Data model is easier to maintain The structure can be very complex Proliferation of entities and relationships Performance can become an issue
Denormalisation ? One entity for the entire month ? Month-Attendance Emp ID Month Year Day 1 Day 2 Day 3  ... Day 31 1 record per employee per month 10,000 employees, 12 months = 120,000 records One entity for each day  Daily-Attendance EmpID Month Year Date YES / NO 29 – 31  records per employee every month 10,000 employees, 12 months = 3.6 million records This will cause a performance problem
The Managerial Perspective Entities  Have all entities in the system been identified and named correctly ? Have all attributes of the entities been identified and named correctly Normalisation Are all enties in Third Normal Form ? If NOT, why NOT ? Have we gone overboard with Normalisation ?  Is there a need to de-Normalise Quick way to remember the rules of normalisation is that every attribute in the entity must depend On THE KEY THE WHOLE KEY and  NOTHING BUT THE KEY

More Related Content

ODP
BIS03 Data Modelling - I
ODP
BIS06 Physical Database Models
PPT
Database management system
PPT
Kul 2
PPTX
DEE 431 Database keys and Normalisation Slide 2
PPTX
Relational database intro for marketers
PPT
Data integrity
PDF
Sql ch 9 - data integrity
BIS03 Data Modelling - I
BIS06 Physical Database Models
Database management system
Kul 2
DEE 431 Database keys and Normalisation Slide 2
Relational database intro for marketers
Data integrity
Sql ch 9 - data integrity

What's hot (11)

PPTX
Referential integrity
PDF
Chapter 2 Relational Data Model-part 3
PPT
Data integrity
PPT
check 11
PDF
Alternate Part Analysis
PPT
Chap05 c
PPT
Lesson03 the relational model
PPTX
Dbms relational data model and sql queries
PPTX
Normalization
PPT
Datastage database design and data modeling ppt 4
PPT
Nunes database
Referential integrity
Chapter 2 Relational Data Model-part 3
Data integrity
check 11
Alternate Part Analysis
Chap05 c
Lesson03 the relational model
Dbms relational data model and sql queries
Normalization
Datastage database design and data modeling ppt 4
Nunes database
Ad

Similar to BIS04 Data Modelling - II (20)

PPTX
Lec 5.pptx
PPT
Normlaization
PPT
PPT
Normalization.ppt
PPT
Normalisation
PDF
Normalization
PDF
Database Management Systems 4 - Normalization
PPT
Lecture8 Normalization Aggarwal
PPTX
Ism normalization pine valley 2012
PDF
Ch06-Normalization SDHVFDDGNMFBVMBNCVMNMV
PPTX
normaliztion
PPTX
Database Management System Normalization
PPT
Normalization.ppt
PPT
DBMS e evevevevevevevbebrbbrbrbrbrbrbrb 4.ppt
PPT
PHP mysql Database normalizatin
PPT
b - Normalizing a Data Model
PDF
Chapter+3+-+Normalization.pdf
PDF
Normalization in DBMS
PPTX
2020.11.10 SISTEM BASIS DATA PERTEMUAN 12 - Normalization_new.pptx
PPT
Normalization_BCA_
Lec 5.pptx
Normlaization
Normalization.ppt
Normalisation
Normalization
Database Management Systems 4 - Normalization
Lecture8 Normalization Aggarwal
Ism normalization pine valley 2012
Ch06-Normalization SDHVFDDGNMFBVMBNCVMNMV
normaliztion
Database Management System Normalization
Normalization.ppt
DBMS e evevevevevevevbebrbbrbrbrbrbrbrb 4.ppt
PHP mysql Database normalizatin
b - Normalizing a Data Model
Chapter+3+-+Normalization.pdf
Normalization in DBMS
2020.11.10 SISTEM BASIS DATA PERTEMUAN 12 - Normalization_new.pptx
Normalization_BCA_
Ad

More from Prithwis Mukerjee (20)

PPTX
Bitcoin, Blockchain and the Crypto Contracts - Part 2
PDF
Bitcoin, Blockchain and Crypto Contracts - Part 3
PDF
Internet of Things
PDF
Thought controlled devices
PDF
Cloudcasting
PDF
Currency, Commodity and Bitcoins
PDF
Data Science
PPT
05 OLAP v6 weekend
ODP
04 Dimensional Analysis - v6
PDF
Thought control
PPT
World of data @ praxis 2013 v2
ODP
BIS 08a - Application Development - II Version 2
PPT
Lecture02 - Data Mining & Analytics
ODP
ইন্টার্নেট কি এবং কেন ?
PPT
Data mining clustering-2009-v0
PPT
Data mining classification-2009-v0
PPT
Data mining arm-2009-v0
PPT
Data mining intro-2009-v2
PPT
PPT
Business Intelligence Industry Perspective Session I
Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and Crypto Contracts - Part 3
Internet of Things
Thought controlled devices
Cloudcasting
Currency, Commodity and Bitcoins
Data Science
05 OLAP v6 weekend
04 Dimensional Analysis - v6
Thought control
World of data @ praxis 2013 v2
BIS 08a - Application Development - II Version 2
Lecture02 - Data Mining & Analytics
ইন্টার্নেট কি এবং কেন ?
Data mining clustering-2009-v0
Data mining classification-2009-v0
Data mining arm-2009-v0
Data mining intro-2009-v2
Business Intelligence Industry Perspective Session I

Recently uploaded (20)

PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
IGGE1 Understanding the Self1234567891011
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
What if we spent less time fighting change, and more time building what’s rig...
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
Computer Architecture Input Output Memory.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Weekly quiz Compilation Jan -July 25.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
IGGE1 Understanding the Self1234567891011
Paper A Mock Exam 9_ Attempt review.pdf.
What if we spent less time fighting change, and more time building what’s rig...
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
History, Philosophy and sociology of education (1).pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
My India Quiz Book_20210205121199924.pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Unit 4 Computer Architecture Multicore Processor.pptx
Computer Architecture Input Output Memory.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
TNA_Presentation-1-Final(SAVE)) (1).pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
B.Sc. DS Unit 2 Software Engineering.pptx

BIS04 Data Modelling - II

  • 1. Business Information Systems Data Modeling - Normalisation Prithwis Mukerjee, Ph.D.
  • 2. Normalization Pros: Ensures that each attribute belongs to the entity to which it is assigned Redundant storage of information is minimized Cons: Can adversely affect performance if rigorously implemented Can adversely affect deadlines if rigorously implemented NORMALIZATION A formal data modeling approach to examining and validating the model.
  • 3. Foundations : Revisited Entity in the Data Model The basic element about which we need to store and process data Order, Customer, Product For each entity there will be multiple instances Multiple “orders” with the means to distinguish one specific “order” from another Actions can be performed on an entity C reate / Add R ead / Display U pdate / Modify D elete Object Oriented Terminology Class : equivalent to an entity Instances of an entity are instances of a class Methods : Actions that affect a class Most methods can be mapped down to variations of C-R-U-D
  • 4. Normal Forms Dr. E. F. Codd identified ‘normal forms’ as the different states of a ‘normalized relational’ data model. 1NF = No repeating groups 2NF = No partial key dependencies 3NF = No non-key interdependencies 4NF = No independent multiple relationships 5NF = No semantically related multiple relationships
  • 5. How to Normalize an Entity Before you normalize an entity, identify its Primary Key Identify and resolve violations of 1NF - make sure there are no repeating groups Identify and resolve violations of 2NF - make sure that each non-key attribute depends on the entire key Identify and resolve violations of 3NF - make sure that no non-key attribute depends on another non-key attribute
  • 6. Order Management System Orders are the life blood for any commercial organisation Orders are received or recorded / created Reports are prepared on orders at hand. Orders are viewed Orders are modified or updated depending on the situation Orders are cancelled or archived when they are no more necessary
  • 7. Identify the Primary Key ORDER ORDER
  • 8. First Normal Form - 1NF Ask the following for each attribute: Does this attribute occur more than once for any given instance? ( NO REPEATING GROUPS ) If yes, build a new entity move all ‘repeating’ attributes to the new entity select or formulate attribute(s) for the new entity’s PK build an IDENTIFYING relationship FROM THE ORIGINAL entity TO THE NEW entity
  • 9. 1NF Violation & Solution Violation : Repeating Groups Solution : Split into two entities { { ORDER ORDER ORDER ITEM FK FK
  • 10. Cleaner 1 NF Solution ORDER ORDER ITEM FK The new entity should be named to reflect its intention, given a primary key, and the inherited foreign key will be present as part of the primary key.
  • 11. Second Normal Form - 2NF For ONLY those entities that have a composite key, ask the following of each non-key attribute: Is this attribute dependent on part of the primary key? ( NO PARTIAL KEY DEPENDENCIES ) If yes, build a new entity move all the attributes having the same partial key dependency to the new entity use the determinant attribute as the key or determine a better PK (move the PK attribute too) build and name an identifying relationship FROM NEW entity BACK TO ORIGINAL entity
  • 12. 2NF Violation and Solution ORDER ITEM ORDER ITEM PRODUCT Violation : Description, Unit Price does not depend on full PK key Solution : Split into two entities FK
  • 13. Now the solution looks like ... ORDER ORDER ITEM PRODUCT
  • 14. Third Normal Form - 3NF For each non-key attribute, ask the following: Does this attribute depend on some other non-key attribute? ( NO Non-Key INTERDEPENDENCIES ) If yes, build a new entity to contain all attributes with same non-key dependency use determinant attribute(s) as the PK build and name a non-identifying, non-mandatory relationship FROM NEW entity BACK TO ORIGINAL entity
  • 15. 3NF Violation & Solution ORDER ORDER CUSTOMER Violation : Address, Credit Limit does not depend on OrderID, but on Customer Name Solution : Create separate entity for Customer FK
  • 16. Final Solution ORDER CUSTOMER ORDER ITEM PRODUCT FK FK FK ORDER UnNormalised Entity Normalised Entities Information about Customers and Products can be recorded even when there are no Orders
  • 17. Entities can proliferate ! Order Management System Began with ORDER Ended with ORDER, ORDER-ITEM, PRODUCT, CUSTOMER Manufacturing System Might begin with PRODUCT End with PRODUCT, MATERIAL, MACHINE, DIMENSION ??? Marketing System Might begin with CUSTOMER End with ?? BUT : Entities should be unique CUSTOMER must have same attributes whether defined in Order Management System Marketing System PRODUCT must have same attributes whether defined in Order Management System Manufacturing System This is why Data Modelling is so important
  • 18. Rationale for Normalisation Data is easier to define Data interdependencies are identified Data ambiguities are resolved Data model can be more flexible Data model is easier to maintain The structure can be very complex Proliferation of entities and relationships Performance can become an issue
  • 19. Denormalisation ? One entity for the entire month ? Month-Attendance Emp ID Month Year Day 1 Day 2 Day 3 ... Day 31 1 record per employee per month 10,000 employees, 12 months = 120,000 records One entity for each day Daily-Attendance EmpID Month Year Date YES / NO 29 – 31 records per employee every month 10,000 employees, 12 months = 3.6 million records This will cause a performance problem
  • 20. The Managerial Perspective Entities Have all entities in the system been identified and named correctly ? Have all attributes of the entities been identified and named correctly Normalisation Are all enties in Third Normal Form ? If NOT, why NOT ? Have we gone overboard with Normalisation ? Is there a need to de-Normalise Quick way to remember the rules of normalisation is that every attribute in the entity must depend On THE KEY THE WHOLE KEY and NOTHING BUT THE KEY

Editor's Notes

  • #3: Normalized data models are often referred to as relational models. However, star schemas and snowflake schemas may also be implemented on top of relational data base management systems. Normalization is the process of removing redundancy in data by separating the data into multiple tables thus designing for efficient and reliable single record access. Relational database theorists have created rules by which degree of normalization is measured. These degrees are called normal forms , with the minimum degree of normalization commonly accepted as 3 rd normal form. Often degree of normalization beyond 3 rd normal form is sacrificed due to hardware limitations. A properly normalized relational data model allows the efficient use of storage space, elimination of redundant data, reduction or elimination of inconsistent data, and minimization of the data maintenance burden. However, an “over normalized” data model may cause performance concerns. Accessing the data requires large table joins, which slows response time. Normalized data models will be in 3 rd Normal Form when the following are true: Repeating groups of data are removed (1 st normal form) Redundant data is removed (2 nd normal form) Attributes of an entity depend upon the key, the whole key, and nothing but the key. Once the model has been normalized to at least 3 rd Normal Form, then the following are true: The structure is remarkably insensitive to change. Structural paths for accessing information are very clear. Create, Report, Update and Delete anomalies are eliminated. Performance can be an issue. The structure can be very complex.
  • #5: Normalized relational data modeling is the classic modeling technique used for organizing entities defined by unique identifiers and attributes that are wholly dependent upon those identifiers. This is the modeling technique that database administrators and modelers are most familiar with, and is most commonly associated with transaction systems development. Normalization is the process of removing redundancy in data by separating the data into multiple tables. There are well established rules of normalization: Eliminate Repeating Groups. Make a separate table for each set of related attributes, and give each table a primary key. (1 st Normal Form) Eliminate Redundant Data. If an attribute depends on only part of a multi-valued key, remove it to a separate table. (2 nd Normal Form) Eliminate Columns Not Dependent on Key. If attributes do not contribute to a description of the key, remove them to a separate table. (3 rd Normal Form) Isolate Independent Multiple Relationships. No table may contain two or more 1:n or n:m relationships that are not directly related. (4 th Normal Form) Isolate Semantically Related Multiple Relationships. There may be practical constrains on information that justify separating logically related many-to-many relationships. (5 th Normal Form) The last two rules, 4 th and 5 th Normal Forms, are not often attained. It is not uncommon, in fact, to denormalize from 3 rd Normal Form in the physical model to address performance concerns. Consequently, the rest of this section will not cover these two forms.
  • #8: Step 1: Source material can be in many different forms. In order to begin the normalization process, this example assumes that sources were combined in an un-normalized form. All attributes of the relation must be identified, along with the key, and any repeating groups. For example, in the un-normalized table above the EMPL NO is underlined to indicate that it is the key. EMPL NO, EMPL NAME, DEPT NO, DEPT NAME, EMPL SEX, COURSE NO, COURSE NAME, and ASSESSMENT are all attributes of the relation. COURSE DATA is recognized as a repeating group, and this is notated with the asteric.
  • #9: Step 2: In order to achieve First Normal Form(1NF) all repeating groups must be removed. The repeating groups were identified in step one, and in order to remove them a new key is created. The relation now has two keys, also referred to as a concatenated key. For example, in the 1NF table above, the COURSE DATA repeating group, the relation now lists only the attributes for the repeating groups, and EMPL NO, and COURSE NO comprise the concatenated key for the relation.
  • #12: Step 3: Removing partial dependencies from the relation will result in the model being in Second Normal Form(2NF). It should be noted that if a relation is in 1NF, and has a single key, then it is already in 2NF. If an attribute is not fully functionally dependent upon the entire key, then this attribute must be removed, and a new relation must be created. A foreign key will indicate the relationship between the relations. For example, in the 2NF table above, EMPL NAME, DEPT NO, DEPT NAME, and EMPL SEX are only dependent upon the EMPL NO, not the COURSE NO. These items are separated into a single relation. COURSE NAME is only dependent upon the COURSE NO. These items are separated into a single relation. COURSE NO, and EMPL NO will now become foreign keys to indicate the relationships among the relations. ASSESSMENT is the only attribute dependent upon the whole key, and hence the creation of another relation.
  • #15: Step 4: Removing mutual dependencies from the relation will result in Third Normal Form (3NF). If an attribute of a relation is mutually dependent upon another attribute, then these attributes must be removed into another relation. A foreign key will indicate the relationship between the two relations. For example, in the 3NF table above, DEPT NAME is mutually dependent upon DEPT NO. The DEPT NO will remain in the employee relation, and a new relation will be created for the DEPT NO and DEPT NAME. DEPT NO in the employee relation will become the foreign key.
  • #19: Normalized data models are often referred to as relational models. However, star schemas and snowflake schemas may also be implemented on top of relational data base management systems. Normalization is the process of removing redundancy in data by separating the data into multiple tables thus designing for efficient and reliable single record access. Relational database theorists have created rules by which degree of normalization is measured. These degrees are called normal forms , with the minimum degree of normalization commonly accepted as 3 rd normal form. Often degree of normalization beyond 3 rd normal form is sacrificed due to hardware limitations. A properly normalized relational data model allows the efficient use of storage space, elimination of redundant data, reduction or elimination of inconsistent data, and minimization of the data maintenance burden. However, an “over normalized” data model may cause performance concerns. Accessing the data requires large table joins, which slows response time. Normalized data models will be in 3 rd Normal Form when the following are true: Repeating groups of data are removed (1 st normal form) Redundant data is removed (2 nd normal form) Attributes of an entity depend upon the key, the whole key, and nothing but the key. Once the model has been normalized to at least 3 rd Normal Form, then the following are true: The structure is remarkably insensitive to change. Structural paths for accessing information are very clear. Create, Report, Update and Delete anomalies are eliminated. Performance can be an issue. The structure can be very complex.