SlideShare a Scribd company logo
Data modeling
Dimensions
Structure of dimension tables I
•Must have a primary key
•Use surrogate keys that are meaningless
integers or identifiers
•Best performance when the joins between facts
and dimensions are on single integer fields
•Natural key versus surrogate key - natural key is
a more meaningful col
•When dimensions are static, there is a 1-1
mapping between natural key and surrogate key
•Descriptive components of the dimension table
Structure of dimension tables II
•efficient generation and maintenance of the
surrogate keys is important for success
•Surrogate keys should not be
smart/intelligent/meaningful because
–definition required them to be meaningless so that
they don’t have to change over time
–performance can be better instead of using
concatenated natural keys
–data type mismatch is ensured. no alpha numeric
values should be allowed. data type should always
be integer/number
Creating dimension tables I
•may be done internally by the ETL system like
lookup tables
•can be extracted from data sources
•data cleaning is important for big complex
dimensions
•conforming consists of aligning the content
from different parts of the data warehouse
•data delivery module for looking after SCDs
Flat and snowflake dimensions I
•Dimensions are flat denormalized tables
•should have small cardinality
•limited values
•if staging data has data in 3NF, these attended
2NF dimensions are easily produced with a
simple query on the 3NF source
•only requirement is that every attribute is
single valued (atomic) in the primary key
•disadvantage of snowflake schema -- Many to
1 relations are hard to look at, complex schemas
Flat and snowflake dimensions II
Flat and snowflake dimensions III
•If an attribute takes multiple values in the
presence of a primary key, then it cannot be
included in the dimension.
•Eq. Cash Register id of retail store: grain is
individual store
•For every new dimension record a fresh
surrogate key should be assigned
Some typical dimensions I
•Date and time dimensions:
a huge table with a primary key and all
possible fields of date, day, month, year,
timestamp, fiscal year, quarter etc.
•Small dimensions:
Mainly used for lookups
•Big dimensions:
merging and de-duplicating
multiple attributes for same
dimension, from different data
sources.
Dimension tables structures I
•Dimensions can be modeled as flat and
snowflakes
•Flat dimensions are denormalized tables
•The Dimension tables should be modeled in
such a way that they have small cardinality
•Every dimension table should have limited
values
•Populating the data from staging to data
warehouse:
–If staging data has data in 3NF, the attended 2NF
dimensions are easily produced with a simple query
Dimension tables structures II
•Snow flaking is defined as creating sub-
dimensions or
•dimensions of other dimensions. This makes
the schema much more cleaner
•If an attribute takes multiple values in the
presence of a primary key, then it cannot be
included in the dimension for every new
dimension records a fresh surrogate key should
be assigned
•A good design practice: identify the correlation
Roles, sub-dimensions and empty
dimensions I
•Roles:
–Concept of using the same table attached to a fact
multiple times
– E.g. 2 roles of employee dimension - manager and
employee
Roles, sub-dimensions and empty
dimensions II
•Sub dimensions:
–Sub dimensions are defined using foreign keys in
the parent dimension table
–they are called as outriggers
Roles, sub-dimensions and empty
dimensions III
•Degenerate dimensions:
–Problem: Parent-child data relationship into
dimensional framework, the natural key of the
parent is usually left as orphan
–Solution: to avoid this, the natural key of parent is
given a special status called empty or degenerate
dimension arises in every parent-child relations
–e.g. : order number, shipment number, billing
number etc.
–They often play an integral role in the fact table’s
primary key.
Roles, sub-dimensions and empty
dimensions II
•ETL dimensional delivery module must convert
selected fields in the input data for the
dimension to foreign key references
•About multi valued dimensions and bridge
tables
–May be linked to a fact table via bridge tables
–Helps to avoid many-many joins by “creating a
group entity"
–Time-varying bridge tables are seen in case of
Type2 SCDs
–Performance overheads for updates and queries
Roles, sub-dimensions and empty
dimensions III
Roles, sub-dimensions and empty
dimensions IV
•Ragged hierarchies
–Arise due to the hierarchies seen in the
organization
–May be related to people, roles, products, billing
information etc.
–Pre-dominant characteristic: The parent member
of at least one member of a dimension is not in the
level immediately above the member
•It can be implemented as recursive pointer
(pointer to a dimension within the same table to
another field) or as a hierarchy bridge table
Data modeling dimensions for dta warehousing
SCDs - recap I
•type 0
–Passive
–Data never changes
•type 1
–Overwrite data
–Using update or insert
functionality
–May cause performance
problems
–May need support of rollback
SCDs - recap II
• type 2
–Also called partitioning history
–Every time a change happens a new surrogate
primary key is assigned and all fact tables now
onwards use this new foreign key
–A cube-form representation with time as one of
the dimension
SCDs - recap III
•type 3
–Called as alternative realities
–Old value of the attribute remains valid as a second
choice
–Creates a new column for a change
SCDs - recap IV
SCDs - recap V
•Hybrid slowly changing dimension
Handling late arriving data
•Required fixes
–insert a new records capturing the change
–scan the dimension after the required date of
modification and destructively overwrite the changed
attribute
–update the fact tables that reference the modified
dimension
•In real time systems, the dimension data usually
arrives after the fact data.
•In such cases it is important to point the fact
table records to a special placeholder and then
Process of loading dimensions I
•Some dimensions are created automatically
without involving the ETL process
•Operational code translated into words and
have no external
•sources
•Remaining ones are extracted from outside
sources. They need special processing as below:
–Data cleaning: identifying and correcting or
removing inaccurate, incorrect , incomplete data
–Data conforming: aligning the content of some
fields in the dimension with similar fields
Process of loading dimensions II
I
•All dimensions are attended out - denormalized
•Create a snowflake dimension if it is not
possible to logically attend out the dimensions
•Identify the fields in every dimension table as
primary / surrogate key (joins with the foreign
key in the fact table),natural key (descriptor of
the data), descriptive attributes (textual details)

More Related Content

What's hot (19)

PPTX
Dimensional Modelling - Basic Concept
Folio3 Software
 
DOCX
Dimensional data model
Vnktp1
 
PDF
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
International Journal of Engineering Inventions www.ijeijournal.com
 
PPTX
Dominick’s finer foods
Sanchia Sequeira-Gonsalves
 
PPT
Data warehousing
Allen Woods
 
PPTX
Slowly changing dimension
Sunita Sahu
 
PPT
MSBI and Data WareHouse techniques by Quontra
QUONTRASOLUTIONS
 
PPTX
Data processing and analysis final
Akul10
 
PPTX
Dimensional model | | Fact Tables | | Types
umair saeed
 
PPTX
Introduction to Dimesional Modelling
Ashish Chandwani
 
PPT
Data processing
Joseph Lagod
 
PPTX
Fact less fact Tables & Aggregate Tables
Sunita Sahu
 
PDF
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
PPT
Data science institutes in hyderabad
Kelly Technologies
 
PPT
DW DIMENSN MODELNG
Divya Tadi
 
PPTX
Dimensional data modeling
Adam Hutson
 
DOCX
Star schema
Chandanapriya Sathavalli
 
PPT
E-R vs Starschema
guest862640
 
PPTX
Data modeling facts
Dr. Dipti Patil
 
Dimensional Modelling - Basic Concept
Folio3 Software
 
Dimensional data model
Vnktp1
 
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
International Journal of Engineering Inventions www.ijeijournal.com
 
Dominick’s finer foods
Sanchia Sequeira-Gonsalves
 
Data warehousing
Allen Woods
 
Slowly changing dimension
Sunita Sahu
 
MSBI and Data WareHouse techniques by Quontra
QUONTRASOLUTIONS
 
Data processing and analysis final
Akul10
 
Dimensional model | | Fact Tables | | Types
umair saeed
 
Introduction to Dimesional Modelling
Ashish Chandwani
 
Data processing
Joseph Lagod
 
Fact less fact Tables & Aggregate Tables
Sunita Sahu
 
Business Intelligence and Multidimensional Database
Russel Chowdhury
 
Data science institutes in hyderabad
Kelly Technologies
 
DW DIMENSN MODELNG
Divya Tadi
 
Dimensional data modeling
Adam Hutson
 
E-R vs Starschema
guest862640
 
Data modeling facts
Dr. Dipti Patil
 

Similar to Data modeling dimensions for dta warehousing (20)

PPTX
Data warehouse 19 dimensional model
Vaibhav Khanna
 
PPTX
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Terry Bunio
 
PPT
Dimensional Modeling
Sunita Sahu
 
PPTX
Module 1.2: Data Warehousing Fundamentals.pptx
NiramayKolalle
 
PPTX
Data Warehouse_Architecture.pptx
Dr. Jasmine Beulah Gnanadurai
 
PPT
Intro to Data warehousing lecture 13
AnwarrChaudary
 
DOCX
Data modelling interview question
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PDF
Data Warehousing concepts for Data Engineering
GouthumM
 
PDF
Schema_______________Types__________.pdf
KatonPang
 
PDF
(Lecture 4)Slowly Changing Dimensions.pdf
MobeenMasoudi
 
PDF
Database aggregation using metadata
Dr Sandeep Kumar Poonia
 
PPT
Dimensional Modeling
Muhammad Zohaib Chaudhary
 
PDF
Asper database presentation - Data Modeling Topics
Terry Bunio
 
PPT
mdmodel multidimensional (MD) modeling approach to represent more complex da...
anitha803197
 
PPTX
Export Data Model | SQL Database Modeler
SQL DBM
 
PPTX
CSC612 THIRD LECTURE ON DATA WAREHOUSE.pptx
MrNdlela
 
PPTX
Data warehouse logical design
Er. Nawaraj Bhandari
 
PPTX
Advanced dimensional modelling
Sid Xing
 
PPTX
Advanced Dimensional Modelling
Vincent Rainardi
 
PPTX
1-Data Warehousing-Multi Dim Data Model.pptx
ShobySunny2
 
Data warehouse 19 dimensional model
Vaibhav Khanna
 
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015
Terry Bunio
 
Dimensional Modeling
Sunita Sahu
 
Module 1.2: Data Warehousing Fundamentals.pptx
NiramayKolalle
 
Data Warehouse_Architecture.pptx
Dr. Jasmine Beulah Gnanadurai
 
Intro to Data warehousing lecture 13
AnwarrChaudary
 
Data Warehousing concepts for Data Engineering
GouthumM
 
Schema_______________Types__________.pdf
KatonPang
 
(Lecture 4)Slowly Changing Dimensions.pdf
MobeenMasoudi
 
Database aggregation using metadata
Dr Sandeep Kumar Poonia
 
Dimensional Modeling
Muhammad Zohaib Chaudhary
 
Asper database presentation - Data Modeling Topics
Terry Bunio
 
mdmodel multidimensional (MD) modeling approach to represent more complex da...
anitha803197
 
Export Data Model | SQL Database Modeler
SQL DBM
 
CSC612 THIRD LECTURE ON DATA WAREHOUSE.pptx
MrNdlela
 
Data warehouse logical design
Er. Nawaraj Bhandari
 
Advanced dimensional modelling
Sid Xing
 
Advanced Dimensional Modelling
Vincent Rainardi
 
1-Data Warehousing-Multi Dim Data Model.pptx
ShobySunny2
 
Ad

Recently uploaded (20)

PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
Ad

Data modeling dimensions for dta warehousing

  • 2. Structure of dimension tables I •Must have a primary key •Use surrogate keys that are meaningless integers or identifiers •Best performance when the joins between facts and dimensions are on single integer fields •Natural key versus surrogate key - natural key is a more meaningful col •When dimensions are static, there is a 1-1 mapping between natural key and surrogate key •Descriptive components of the dimension table
  • 3. Structure of dimension tables II •efficient generation and maintenance of the surrogate keys is important for success •Surrogate keys should not be smart/intelligent/meaningful because –definition required them to be meaningless so that they don’t have to change over time –performance can be better instead of using concatenated natural keys –data type mismatch is ensured. no alpha numeric values should be allowed. data type should always be integer/number
  • 4. Creating dimension tables I •may be done internally by the ETL system like lookup tables •can be extracted from data sources •data cleaning is important for big complex dimensions •conforming consists of aligning the content from different parts of the data warehouse •data delivery module for looking after SCDs
  • 5. Flat and snowflake dimensions I •Dimensions are flat denormalized tables •should have small cardinality •limited values •if staging data has data in 3NF, these attended 2NF dimensions are easily produced with a simple query on the 3NF source •only requirement is that every attribute is single valued (atomic) in the primary key •disadvantage of snowflake schema -- Many to 1 relations are hard to look at, complex schemas
  • 6. Flat and snowflake dimensions II
  • 7. Flat and snowflake dimensions III •If an attribute takes multiple values in the presence of a primary key, then it cannot be included in the dimension. •Eq. Cash Register id of retail store: grain is individual store •For every new dimension record a fresh surrogate key should be assigned
  • 8. Some typical dimensions I •Date and time dimensions: a huge table with a primary key and all possible fields of date, day, month, year, timestamp, fiscal year, quarter etc. •Small dimensions: Mainly used for lookups
  • 9. •Big dimensions: merging and de-duplicating multiple attributes for same dimension, from different data sources.
  • 10. Dimension tables structures I •Dimensions can be modeled as flat and snowflakes •Flat dimensions are denormalized tables •The Dimension tables should be modeled in such a way that they have small cardinality •Every dimension table should have limited values •Populating the data from staging to data warehouse: –If staging data has data in 3NF, the attended 2NF dimensions are easily produced with a simple query
  • 11. Dimension tables structures II •Snow flaking is defined as creating sub- dimensions or •dimensions of other dimensions. This makes the schema much more cleaner •If an attribute takes multiple values in the presence of a primary key, then it cannot be included in the dimension for every new dimension records a fresh surrogate key should be assigned •A good design practice: identify the correlation
  • 12. Roles, sub-dimensions and empty dimensions I •Roles: –Concept of using the same table attached to a fact multiple times – E.g. 2 roles of employee dimension - manager and employee
  • 13. Roles, sub-dimensions and empty dimensions II •Sub dimensions: –Sub dimensions are defined using foreign keys in the parent dimension table –they are called as outriggers
  • 14. Roles, sub-dimensions and empty dimensions III •Degenerate dimensions: –Problem: Parent-child data relationship into dimensional framework, the natural key of the parent is usually left as orphan –Solution: to avoid this, the natural key of parent is given a special status called empty or degenerate dimension arises in every parent-child relations –e.g. : order number, shipment number, billing number etc. –They often play an integral role in the fact table’s primary key.
  • 15. Roles, sub-dimensions and empty dimensions II •ETL dimensional delivery module must convert selected fields in the input data for the dimension to foreign key references •About multi valued dimensions and bridge tables –May be linked to a fact table via bridge tables –Helps to avoid many-many joins by “creating a group entity" –Time-varying bridge tables are seen in case of Type2 SCDs –Performance overheads for updates and queries
  • 16. Roles, sub-dimensions and empty dimensions III
  • 17. Roles, sub-dimensions and empty dimensions IV •Ragged hierarchies –Arise due to the hierarchies seen in the organization –May be related to people, roles, products, billing information etc. –Pre-dominant characteristic: The parent member of at least one member of a dimension is not in the level immediately above the member •It can be implemented as recursive pointer (pointer to a dimension within the same table to another field) or as a hierarchy bridge table
  • 19. SCDs - recap I •type 0 –Passive –Data never changes •type 1 –Overwrite data –Using update or insert functionality –May cause performance problems –May need support of rollback
  • 20. SCDs - recap II • type 2 –Also called partitioning history –Every time a change happens a new surrogate primary key is assigned and all fact tables now onwards use this new foreign key –A cube-form representation with time as one of the dimension
  • 21. SCDs - recap III •type 3 –Called as alternative realities –Old value of the attribute remains valid as a second choice –Creates a new column for a change
  • 23. SCDs - recap V •Hybrid slowly changing dimension
  • 24. Handling late arriving data •Required fixes –insert a new records capturing the change –scan the dimension after the required date of modification and destructively overwrite the changed attribute –update the fact tables that reference the modified dimension •In real time systems, the dimension data usually arrives after the fact data. •In such cases it is important to point the fact table records to a special placeholder and then
  • 25. Process of loading dimensions I •Some dimensions are created automatically without involving the ETL process •Operational code translated into words and have no external •sources •Remaining ones are extracted from outside sources. They need special processing as below: –Data cleaning: identifying and correcting or removing inaccurate, incorrect , incomplete data –Data conforming: aligning the content of some fields in the dimension with similar fields
  • 26. Process of loading dimensions II I •All dimensions are attended out - denormalized •Create a snowflake dimension if it is not possible to logically attend out the dimensions •Identify the fields in every dimension table as primary / surrogate key (joins with the foreign key in the fact table),natural key (descriptor of the data), descriptive attributes (textual details)