SlideShare a Scribd company logo
4
Most read
5
Most read
Slowly Changing Dimensions
SCD overview
In the real world, the business does not remain static over time;
products change size and weight, customers relocate, stores change
layouts, and sales staff are assigned to different locations. Most
systems of record keep only the current values for business subjects
(e.g., the current customer address). An operational data store keeps
only a short history of changes to indicate that changes have occurred
and to support business processes handling the immediate changes.
In a data warehouse or data mart, we need to know the history of
values to match the history of facts with the correct dimensional
descriptions at the time the facts happened.
For example,
We need to associate a sales fact with the description of the associated
customer during the time period of the sales fact, which may not be
the description of that customer today. Of course, business subjects
change slowly compared with most transactional data (e.g., inventory
level).
Three ways for handling SCD attributes
1. Overwrite the current value with the new value. (Type1)
2. For each dimension attribute that changes, create a current value
field and as many old value fields as we wish (i.e., a multivalued
attribute with a fixed number of occurrences for a limited historical
view). (Type 3)
3. Create a new dimension table row (with a new surrogate key) each
time the dimension object changes; this new row contains all the
dimension characteristics at the time of the change; the new
surrogate key is the original surrogate key plus the start date for the
period when these dimension values are in effect. (Type 2)
Example of Type 2 SCD Customer dimension
table
Example of Type 2 SCD Customer dimension
table (cont…)
Finding the dimension row for a fact row is a little more complex; the
SQL WHERE clause would include the following:
WHERE Fact.CustomerKey = Customer.CustomerKey AND
Fact.DateKey BETWEEN Customer.StartDate and Customer.EndDate
Note:
Type 2 SCD can cause an excessive number of dimension table rows
when dimension objects frequently change or when dimension rows
are large “monster dimensions.” Also, if only a small portion of the
dimension row has changing values, there are excessive redundant data
created.
The solution for that is dimension segmentation.
Dimension Segmentation
A dimension is segmented into two dimension tables; one segment
may hold nearly constant or very slowly changing dimensions and other
segments hold clusters of attributes that change more rapidly and, for
attributes in the same cluster, often change at the same time. These
more rapidly changing attributes are often called “hot” attributes by
data warehouse designers.
Example of Dimension Segmentation
Determining Dimensions and Facts
1. What were the dollar sales of health and beauty products in North
America to customers over the age of 50 in each of the past three
years?
2. What is the name of the salesperson who had the highest dollar
sales of each product in the first quarter of this year?
3. How many European customer complaints did we receive on pet
food products during the past year? How has it changed from month to
month this year?
4. What is the name of the stores that had the highest average monthly
quantity sales of casual clothing during the summer?
Fact-qualifier matrix for sales and customer
service tracking
Star schema for sales and customer service
tracking
Ten Essential Rules of Dimensional Modeling
1. Use atomic facts: Eventually, users want detailed data, even if their initial
requests are for summarized facts.
2. Create single-process fact tables: Each fact table should address the important
measurements for one business process, such as taking a customer order or placing
a material purchase order.
3. Include a date dimension for every fact table: A fact should be described by the
characteristics of the associated day (or finer) date/time to which that fact is
related.
4. Enforce consistent grain: Each measurement in a fact table must be atomic for
the same combination of keys (the same grain).
5. Disallow null keys in fact tables: Facts apply to the combination of key values,
and helper tables may be needed to represent some M:N relationships.
Ten Essential Rules of Dimensional
Modeling(cont…)
6. Honor hierarchies: Understand the hierarchies of dimensions and carefully choose to
snowflake the hierarchy or denormalize into one dimension.
7. Decode dimension tables: Store descriptions of surrogate keys and codes used in fact
tables in associated dimension tables, which can then be used to report labels and query
filters.
8. Use surrogate keys: All dimension table rows should be identified by a surrogate key,
with descriptive columns showing the associated production and source system keys.
9. Conform dimensions: Conformed dimensions should be used across multiple fact tables.
10. Balance requirements with actual data: Unfortunately, source data may not precisely
support all business requirements, so you must balance what is technically possible with
what users want and need.
Role of MetaData
The metadata associated with data marts are often referred to as a
“data catalog,” “data directory,” or some similar term. Metadata serve
as kind of a “yellow pages” directory to the data in the data marts. The
metadata should allow users to easily answer questions such as the
following:
Role of MetaData (cont…)
1. What subjects are described in the data mart? (Typical subjects are customers,
patients, students, products, courses, and so on.)
2. What dimensions and facts are included in the data mart? What is the grain of the
fact table?
3. How are the data in the data mart derived from the enterprise data warehouse
data? What rules are used in the derivation?
4. How are the data in the enterprise data warehouse derived from operational data?
What rules are used in this derivation?
5. What reports and predefined queries are available to view the data?
6. What drill-down and other data analysis techniques are available?
7. Who is responsible for the quality of data in the data marts, and to whom are requests
for changes made?
SQL OLAP Querying
Question:
Which customer has bought the most of each product we sell? Show
the product ID and description, customer ID and name, and the total
quantity sold of that product to that customer; show the results in
sequence by product ID.
Query:
SELECT P1.ProductId, ProductDescription, C1.CustomerId,
CustomerName,SUM(OL1.OrderedQuantity) AS TotOrdered FROM
Customer_T AS C1, Product_T AS P1, OrderLine_T AS OL1, Order_T AS
O1 WHERE C1.CustomerId = O1.CustomerId AND O1.OrderId =
OL1.OrderId AND OL1.ProductId = P1.ProductId GROUP BY
P1.ProductId, ProductDescription, C1.CustomerId, CustomerName
HAVING TotOrdered >= ALL (SELECT SUM(OL2.OrderedQuantity) FROM
OrderLine_T AS OL2, Order_T AS O2 WHERE OL2.ProductId =
P1.ProductId AND OL2.OrderId = O2.OrderId AND O2.CustomerId <>
C1.CustomerId GROUP BY O2.CustomerId) ORDER BY P1.ProductId;

More Related Content

What's hot (20)

PPTX
Outcome-Driven Product Backlog Management by Mike Dwyer - Agile Maine Day 2016
agilemaine
 
PPTX
ETL Testing Interview Questions and Answers
H2Kinfosys
 
PDF
Introduction to column oriented databases
ArangoDB Database
 
PDF
Agile modeling
Amir Moghimi
 
PPT
Unit iii(part c - user interface design)
BALAJI A
 
PPTX
Agile
Abhinav Regmi
 
PPTX
Managing Requirements in Agile Development - Best Practices for Tool-Based Re...
pd7.group
 
PDF
Agile software development
Rajesh Piryani
 
PPTX
An Introduction to Talend Integration Cloud
Talend
 
PPTX
Object oriented database concepts
Temesgenthanks
 
PPTX
Jira software 8.0 8.5 community presentation
Maitrey Patel
 
PPTX
Schema migrations in no sql
Dr-Dipali Meher
 
PDF
Agile model
DivyaStephen3
 
PDF
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
confluent
 
PDF
Sales and inventory management project report
Upendra Sengar
 
PPT
Fundamentals of Database ppt ch03
Jotham Gadot
 
PPTX
webMethods Integration Server Introduction
Arul ChristhuRaj Alphonse
 
PPTX
Unit 7 performing user interface design
Preeti Mishra
 
PPTX
Waterfall, Spiral and iterative model
Sandesh Jonchhe
 
PPTX
Designing modern dw and data lake
punedevscom
 
Outcome-Driven Product Backlog Management by Mike Dwyer - Agile Maine Day 2016
agilemaine
 
ETL Testing Interview Questions and Answers
H2Kinfosys
 
Introduction to column oriented databases
ArangoDB Database
 
Agile modeling
Amir Moghimi
 
Unit iii(part c - user interface design)
BALAJI A
 
Managing Requirements in Agile Development - Best Practices for Tool-Based Re...
pd7.group
 
Agile software development
Rajesh Piryani
 
An Introduction to Talend Integration Cloud
Talend
 
Object oriented database concepts
Temesgenthanks
 
Jira software 8.0 8.5 community presentation
Maitrey Patel
 
Schema migrations in no sql
Dr-Dipali Meher
 
Agile model
DivyaStephen3
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
confluent
 
Sales and inventory management project report
Upendra Sengar
 
Fundamentals of Database ppt ch03
Jotham Gadot
 
webMethods Integration Server Introduction
Arul ChristhuRaj Alphonse
 
Unit 7 performing user interface design
Preeti Mishra
 
Waterfall, Spiral and iterative model
Sandesh Jonchhe
 
Designing modern dw and data lake
punedevscom
 

Similar to (Lecture 4)Slowly Changing Dimensions.pdf (20)

PPTX
Introduction to Dimesional Modelling
Ashish Chandwani
 
DOCX
Data modelling interview question
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PPT
Data Warehousing and Data Mining
idnats
 
PPT
Datawarehouse Overview
ashok kumar
 
PPT
Modelado Dimensional 4 Etapas
Roberto Espinosa
 
PPTX
Data warehouse logical design
Er. Nawaraj Bhandari
 
PPT
Intro to Data warehousing lecture 13
AnwarrChaudary
 
PPTX
Lecture 3:Introduction to Dimensional Modelling.pptx
RehmahAtugonza
 
PPT
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
PDF
(Lecture 3) Star Schema.pdf
MobeenMasoudi
 
PPT
Dimensional Modeling
Sunita Sahu
 
PPT
Modelado Dimensional 4 etapas.ppt
ssuser39e08e
 
PPTX
Data Warehousing for students educationpptx
jainyshah20
 
DOCX
Designing the business process dimensional model
Gersiton Pila Challco
 
PPTX
DWH DWH DWH DWH DWH DWH DWH DWH- QP.pptx
pandaonlineservice25
 
PDF
BI Suite Overview
Bruno Saraiva
 
PDF
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
PPT
An introduction to data warehousing
Shahed Khalili
 
PPTX
Dimensional Modeling
Bryan Cafferky
 
PPT
introduction to datawarehouse
kiran14360
 
Introduction to Dimesional Modelling
Ashish Chandwani
 
Data Warehousing and Data Mining
idnats
 
Datawarehouse Overview
ashok kumar
 
Modelado Dimensional 4 Etapas
Roberto Espinosa
 
Data warehouse logical design
Er. Nawaraj Bhandari
 
Intro to Data warehousing lecture 13
AnwarrChaudary
 
Lecture 3:Introduction to Dimensional Modelling.pptx
RehmahAtugonza
 
OLAP Cubes in Datawarehousing
Prithwis Mukerjee
 
(Lecture 3) Star Schema.pdf
MobeenMasoudi
 
Dimensional Modeling
Sunita Sahu
 
Modelado Dimensional 4 etapas.ppt
ssuser39e08e
 
Data Warehousing for students educationpptx
jainyshah20
 
Designing the business process dimensional model
Gersiton Pila Challco
 
DWH DWH DWH DWH DWH DWH DWH DWH- QP.pptx
pandaonlineservice25
 
BI Suite Overview
Bruno Saraiva
 
Date Analysis .pdf
ABDEL RAHMAN KARIM
 
An introduction to data warehousing
Shahed Khalili
 
Dimensional Modeling
Bryan Cafferky
 
introduction to datawarehouse
kiran14360
 
Ad

More from MobeenMasoudi (14)

PPTX
Purpose the of Process Modeling.pptx
MobeenMasoudi
 
PPTX
Purpose of Process Modeling.pptx
MobeenMasoudi
 
PPTX
Process Modeling.pptx
MobeenMasoudi
 
PPTX
Flow Charting – Disadvantages.pptx
MobeenMasoudi
 
PPTX
Commonly Used Process Modeling Notations.pptx
MobeenMasoudi
 
PPTX
BPModeling1.pptx
MobeenMasoudi
 
PPTX
BPMN 2.pptx
MobeenMasoudi
 
PPTX
chapter1.pptx
MobeenMasoudi
 
PDF
(Lecture 5)OLAP Querying.pdf
MobeenMasoudi
 
PPTX
Data Warehousing Overview.pptx
MobeenMasoudi
 
PPTX
Himmatullah Ferozee Assingnment.pptx
MobeenMasoudi
 
PPTX
E-A Assessing the value of enterprise architecture.pptx
MobeenMasoudi
 
PDF
(Lecture 2)Data Warehouse Architecture.pdf
MobeenMasoudi
 
PDF
(Lecture 1)Data Warehousing Overview.pdf
MobeenMasoudi
 
Purpose the of Process Modeling.pptx
MobeenMasoudi
 
Purpose of Process Modeling.pptx
MobeenMasoudi
 
Process Modeling.pptx
MobeenMasoudi
 
Flow Charting – Disadvantages.pptx
MobeenMasoudi
 
Commonly Used Process Modeling Notations.pptx
MobeenMasoudi
 
BPModeling1.pptx
MobeenMasoudi
 
BPMN 2.pptx
MobeenMasoudi
 
chapter1.pptx
MobeenMasoudi
 
(Lecture 5)OLAP Querying.pdf
MobeenMasoudi
 
Data Warehousing Overview.pptx
MobeenMasoudi
 
Himmatullah Ferozee Assingnment.pptx
MobeenMasoudi
 
E-A Assessing the value of enterprise architecture.pptx
MobeenMasoudi
 
(Lecture 2)Data Warehouse Architecture.pdf
MobeenMasoudi
 
(Lecture 1)Data Warehousing Overview.pdf
MobeenMasoudi
 
Ad

Recently uploaded (20)

DOCX
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
PDF
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
PDF
All chapters of Strength of materials.ppt
girmabiniyam1234
 
PPTX
cybersecurityandthe importance of the that
JayachanduHNJc
 
PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PDF
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
PPTX
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PPTX
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
All chapters of Strength of materials.ppt
girmabiniyam1234
 
cybersecurityandthe importance of the that
JayachanduHNJc
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
Information Retrieval and Extraction - Module 7
premSankar19
 
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 

(Lecture 4)Slowly Changing Dimensions.pdf

  • 2. SCD overview In the real world, the business does not remain static over time; products change size and weight, customers relocate, stores change layouts, and sales staff are assigned to different locations. Most systems of record keep only the current values for business subjects (e.g., the current customer address). An operational data store keeps only a short history of changes to indicate that changes have occurred and to support business processes handling the immediate changes. In a data warehouse or data mart, we need to know the history of values to match the history of facts with the correct dimensional descriptions at the time the facts happened.
  • 3. For example, We need to associate a sales fact with the description of the associated customer during the time period of the sales fact, which may not be the description of that customer today. Of course, business subjects change slowly compared with most transactional data (e.g., inventory level).
  • 4. Three ways for handling SCD attributes 1. Overwrite the current value with the new value. (Type1) 2. For each dimension attribute that changes, create a current value field and as many old value fields as we wish (i.e., a multivalued attribute with a fixed number of occurrences for a limited historical view). (Type 3) 3. Create a new dimension table row (with a new surrogate key) each time the dimension object changes; this new row contains all the dimension characteristics at the time of the change; the new surrogate key is the original surrogate key plus the start date for the period when these dimension values are in effect. (Type 2)
  • 5. Example of Type 2 SCD Customer dimension table
  • 6. Example of Type 2 SCD Customer dimension table (cont…) Finding the dimension row for a fact row is a little more complex; the SQL WHERE clause would include the following: WHERE Fact.CustomerKey = Customer.CustomerKey AND Fact.DateKey BETWEEN Customer.StartDate and Customer.EndDate
  • 7. Note: Type 2 SCD can cause an excessive number of dimension table rows when dimension objects frequently change or when dimension rows are large “monster dimensions.” Also, if only a small portion of the dimension row has changing values, there are excessive redundant data created. The solution for that is dimension segmentation.
  • 8. Dimension Segmentation A dimension is segmented into two dimension tables; one segment may hold nearly constant or very slowly changing dimensions and other segments hold clusters of attributes that change more rapidly and, for attributes in the same cluster, often change at the same time. These more rapidly changing attributes are often called “hot” attributes by data warehouse designers.
  • 9. Example of Dimension Segmentation
  • 10. Determining Dimensions and Facts 1. What were the dollar sales of health and beauty products in North America to customers over the age of 50 in each of the past three years? 2. What is the name of the salesperson who had the highest dollar sales of each product in the first quarter of this year? 3. How many European customer complaints did we receive on pet food products during the past year? How has it changed from month to month this year? 4. What is the name of the stores that had the highest average monthly quantity sales of casual clothing during the summer?
  • 11. Fact-qualifier matrix for sales and customer service tracking
  • 12. Star schema for sales and customer service tracking
  • 13. Ten Essential Rules of Dimensional Modeling 1. Use atomic facts: Eventually, users want detailed data, even if their initial requests are for summarized facts. 2. Create single-process fact tables: Each fact table should address the important measurements for one business process, such as taking a customer order or placing a material purchase order. 3. Include a date dimension for every fact table: A fact should be described by the characteristics of the associated day (or finer) date/time to which that fact is related. 4. Enforce consistent grain: Each measurement in a fact table must be atomic for the same combination of keys (the same grain). 5. Disallow null keys in fact tables: Facts apply to the combination of key values, and helper tables may be needed to represent some M:N relationships.
  • 14. Ten Essential Rules of Dimensional Modeling(cont…) 6. Honor hierarchies: Understand the hierarchies of dimensions and carefully choose to snowflake the hierarchy or denormalize into one dimension. 7. Decode dimension tables: Store descriptions of surrogate keys and codes used in fact tables in associated dimension tables, which can then be used to report labels and query filters. 8. Use surrogate keys: All dimension table rows should be identified by a surrogate key, with descriptive columns showing the associated production and source system keys. 9. Conform dimensions: Conformed dimensions should be used across multiple fact tables. 10. Balance requirements with actual data: Unfortunately, source data may not precisely support all business requirements, so you must balance what is technically possible with what users want and need.
  • 15. Role of MetaData The metadata associated with data marts are often referred to as a “data catalog,” “data directory,” or some similar term. Metadata serve as kind of a “yellow pages” directory to the data in the data marts. The metadata should allow users to easily answer questions such as the following:
  • 16. Role of MetaData (cont…) 1. What subjects are described in the data mart? (Typical subjects are customers, patients, students, products, courses, and so on.) 2. What dimensions and facts are included in the data mart? What is the grain of the fact table? 3. How are the data in the data mart derived from the enterprise data warehouse data? What rules are used in the derivation? 4. How are the data in the enterprise data warehouse derived from operational data? What rules are used in this derivation? 5. What reports and predefined queries are available to view the data? 6. What drill-down and other data analysis techniques are available? 7. Who is responsible for the quality of data in the data marts, and to whom are requests for changes made?
  • 17. SQL OLAP Querying Question: Which customer has bought the most of each product we sell? Show the product ID and description, customer ID and name, and the total quantity sold of that product to that customer; show the results in sequence by product ID.
  • 18. Query: SELECT P1.ProductId, ProductDescription, C1.CustomerId, CustomerName,SUM(OL1.OrderedQuantity) AS TotOrdered FROM Customer_T AS C1, Product_T AS P1, OrderLine_T AS OL1, Order_T AS O1 WHERE C1.CustomerId = O1.CustomerId AND O1.OrderId = OL1.OrderId AND OL1.ProductId = P1.ProductId GROUP BY P1.ProductId, ProductDescription, C1.CustomerId, CustomerName HAVING TotOrdered >= ALL (SELECT SUM(OL2.OrderedQuantity) FROM OrderLine_T AS OL2, Order_T AS O2 WHERE OL2.ProductId = P1.ProductId AND OL2.OrderId = O2.OrderId AND O2.CustomerId <> C1.CustomerId GROUP BY O2.CustomerId) ORDER BY P1.ProductId;