SlideShare a Scribd company logo
Data Warehouse
Slide 2-2
3
Agenda
 Data Warehouse architecture &
building blocks
 ER modeling review
 Need for Dimensional Modeling
 Dimensional modeling & its inside
 Comparison of ER with dimensional
Data Warehouse
5
Components
 Major components
 Source data component
 Data staging component
 Information delivery component
 Metadata component
 Management and control component
6
1. Source Data Components
 Source data can be grouped into 4 components
 Production data
 Comes from operational systems of enterprise
 Some segments are selected from it
 Narrow scope, e.g. order details
 Internal data
 Private datasheet, documents, customer profiles etc.
 E.g. Customer profiles for specific offering
 Special strategies to transform ‘it’ to DW (text document)
 Archived data
 Old data is archived
 DW have snapshots of historical data
 External data
 Executives depend upon external sources
 E.g. market data of competitors, car rental require new manufacturing.
Define conversion
7
Architecture of DW
8
2. Data Staging Components
 After data is extracted, data is to be prepared
 Data extracted from sources needs to be changed,
converted and made ready in suitable format
 Three major functions to make data ready
 Extract
 Transform
 Load
 Staging area provides a place and area with a set
of functions to
 Clean
 Change
 Combine
 Convert
9
Architecture of DW
10
3. Data Storage Components
 Separate repository
 Data structured for efficient processing
 Redundancy is increased
 Updated after specific periods
 Only read-only
11
Architecture of DW
12
4. Information Delivery Component
 Authentication issues
 Active monitoring services
Performance, DBA note selected
aggregates to change storage
User performance
Aggregate awareness
E.g. mining, OLAP etc
Slide 2-13
14
DW Design
15
Architecture of DW
16
Background (ER Modeling)
 For ER modeling, entities are collected from the
environment
 Each entity act as a table
 Success reasons
 Normalized after ER, since it removes redundancy (to
handle update/delete anomalies)
 But number of tables is increased
 Is useful for fast access of small amount of data
ER Drawbacks for DW / Need of Dimensional
Modeling
 ER Hard to remember, due to increased number of tables
 Complex for queries with multiple tables (table joins)
 Conventional RDBMS optimized for small number of tables
whereas large number of tables might be required in DW
 Ideally no calculated attributes
 The DW does not require to update data like in OLTP system
so there is no need of normalization
 OLAP is not the only purpose of DW, we need a model that
facilitate integration of data, data mining, historically
consolidated data.
 Efficient indexing scheme to avoid screening of all data
 De-Normalization (in DW)
 Add primary key
 Direct relationships
 Re-introduce redundancy
17
18
Dimensional Modeling
 Dimensional Modeling focuses subject-
orientation, critical factors of business
 Critical factors are stored in facts
 Redundancy is no problem, achieve efficiency
 Logical design technique for high performance
 Is the modeling technique for storage
Dimensional Modeling (cont.)
 Two important concepts
Fact
 Numeric measurements, represent business activity/event
 Are pre-computed, redundant
 Example: Profit, quantity sold
Dimension
 Qualifying characteristics, perspective to a fact
 Example: date (Date, month, quarter, year)
19
20
Dimensional Modeling (cont.)
 Facts are stored in fact table
 Dimensions are represented by dimension tables
 Dimensions are degrees in which facts can be judged
 Each fact is surrounded by dimension tables
 Looks like a star so called Star Schema
21
Example
TIME
time_key
(PK)
SQL_date
day_of_wee
k
month
STORE
store_key
(PK)
store_ID
store_name
address
district
floor_type
CLERK
clerk_key
(PK)
clerk_id
PRODUCT
product_key
(PK)
SKU
description
brand
category
CUSTOMER
customer_key
(PK)
customer_nam
e
purchase_profi
le
credit_profile
address
PROMOTION
promotion_key
(PK)
promotion_nam
FACT
time_key (FK)
store_key (FK)
clerk_key (FK)
product_key
(FK)
customer_key
(FK)
promotion_key
(FK)
dollars_sold
units_sold
22
Inside Dimensional Modeling
 Inside Dimension table
 Key attribute of dimension table, for identification
 Large no of columns, wide table
 Non-calculated attributes, textual attributes
 Attributes are not directly related
 Un-normalized in Star schema
 Ability to drill-down and drill-up are two ways of
exploiting dimensions
 Can have multiple hierarchies
 Relatively small number of records
23
Inside Dimensional Modeling
 Have two types of attributes
 Key attributes, for connections
 Facts
 Inside fact table
 Concatenated key
 Grain or level of data identified
 Large number of records
 Limited attributes
 Sparse data set
 Degenerate dimensions (order number Average products per
order)
 Fact-less fact table
24
Star Schema Keys
 Primary keys
 Identifying attribute in dimension table
 Relationship attributes combine together to form P.K
 Surrogate keys
 Replacement of primary key
 System generated
 Foreign keys
 Collection of primary keys of dimension tables
 Primary key to fact table
 System generated
 Collection of P.Ks
25
Advantage of Star Schema
 Ease for users to understand
 Optimized for navigation (less joins fast)
 Most suitable for query processing
Karen Corral, et al. (2006) The impact of alternative
diagrams on the accuracy of recall: A comparison of
star-schema diagrams and entity-relationship diagrams,
Decision Support Systems, 42(1), 450-468.
DATA WAREHOUSES AND
DATA MARTS
DATA WAREHOUSES AND DATA MARTS
 Bill Inmon stated, “The single most important issue facing the IT manager
this year is whether to build the data warehouse first or the data mart first.”
 This statement is true even today. Let us examine this statement and take a
stand
 Before deciding to build a data warehouse for your organization, you need to
ask the
 Following basic and fundamental questions and address the relevant issues:
 Top-down or bottom-up approach?
 Enterprise-wide or departmental?
 Which first—data warehouse or data mart?
 Build pilot or go with a full-fledged implementation?
 Dependent or independent data marts?
Data Granularity
Top Down Versus Bottom Approach
3._DWH_Architecture__Components.ppt
A Practical Approach
 In order to formulate an approach for your organization, you need to examine
what exactly
 Your organization wants. Is your organization looking for long-term results or
fast data
 Marts for only a few subjects for now? Does your organization want quick,
proof-of-concept,
 Throw-away implementations? Or, do you want to look into some other practical
approach?
 Although both the top-down and the bottom-up approaches each have their own
advantages and drawbacks, a compromise approach accommodating both views
appears to be practical.
 The chief proponent of this practical approach is Ralph Kimball, an eminent
author and data warehouse expert. The steps in this practical approach are as
follows:
1. Plan and define requirements at the overall corporate level
2. Create a surrounding architecture for a complete warehouse
3. Conform and standardize the data content
4. Implement the data warehouse as a series of supermarts, one at a time
METADATA IN THE DATA
WAREHOUSE
Types of Metadata
 Metadata in a data warehouse fall into three major categories:
 Operational Metadata
 Extraction and Transformation Metadata
 End-User Metadata
Operational Metadata
 As you know, data for the data warehouse comes from several operational
systems of the enterprise. These source systems contain different data structures.
 The data elements selected for the data warehouse have various field lengths and
data types.
 In selecting data from the source systems for the data warehouse, you split
records, combine parts of records from different source files, and deal with
multiple coding schemes and field lengths.
 When you deliver information to the end-users, you must be able to tie that back
to the original source data sets.
 Operational metadata contain all of this information about the operational data
sources.
Extraction and Transformation
Metadata
 Extraction and transformation metadata contain data about the extraction of data
from the source systems, namely, the extraction frequencies, extraction methods,
and business rules for the data extraction.
 Also, this category of metadata contains information about all the data
transformations that take place in the data staging area.
End-User Metadata
 The end-user metadata is the navigational map of the data warehouse.
 It enables the end-users to find information from the data warehouse.
 The end-user metadata allows the end-users to use their own business
terminologies.
Significance
 Why is metadata especially important in a data warehouse?
 First, it acts as the glue that connects all parts of the data
warehouse.
 Next, it provides information about the contents and structures to
the developers.
 Finally, it opens the door to the end-users and makes the contents
recognizable in their own terms.
Exercise
 A data warehouse is subject-oriented. What would be the major critical
business subjects for the following companies?
 An international manufacturing company
 A local community bank
 A domestic hotel chain
 You are the data analyst on the project team building a data warehouse for
an insurance company. List the possible data sources from which you will
bring the data into your data warehouse. State your assumptions.
 For an airlines company, identify three operational applications that would
feed into the data warehouse. What would be the data load and refresh
cycles?
 Prepare a table showing all the potential users and informationdelivery
methods for a data warehouse supporting a large national grocery chain.
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall Slide 2-37

More Related Content

Similar to 3._DWH_Architecture__Components.ppt (20)

PDF
data warehousing and online analtytical processing
321106410027
 
PPT
Data WareHousing and OLAP Details and Description
syedas1mal1
 
PDF
Data Warehouse Introduction - Data Mining
rohanrajput0070101
 
PPTX
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
ShivaniTiwari24572
 
PPT
Data warehouse
shachibattar
 
PDF
Data warehousing
Juhi Mahajan
 
DOCX
Dimensional data model
Vnktp1
 
PPTX
Data warehouse
RajThakuri
 
PDF
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
cscpconf
 
PPTX
DATA WAREHOUSING
Rishikese MR
 
PPT
Data warehousing and online analytical processing
VijayasankariS
 
PPT
Data warehouse
_123_
 
PDF
Data Warehouse and Architecture, OLAP Operation
ShivarkarSandip
 
PDF
UNIT 1- Data Warehouse.pdf
Nancykumari47
 
PPT
20IT501_DWDM_PPT_Unit_I.ppt
SumathiG8
 
PPTX
Business Intelligence Module 3_Datawarehousing.pptx
AmbikaVenkatesh4
 
PPTX
Data Warehouse by Amr Ali
Amr Ali
 
PPT
Chapter29.ppt
VarchasvaTiwari2
 
PDF
turban_dss9e_ch08-DataWarehouse-Published.pdf
ikachanz
 
PPT
Dataware housing
work
 
data warehousing and online analtytical processing
321106410027
 
Data WareHousing and OLAP Details and Description
syedas1mal1
 
Data Warehouse Introduction - Data Mining
rohanrajput0070101
 
MIS and Business Functions, TPS/DSS/ESS, MIS and Business Processes, Impact o...
ShivaniTiwari24572
 
Data warehouse
shachibattar
 
Data warehousing
Juhi Mahajan
 
Dimensional data model
Vnktp1
 
Data warehouse
RajThakuri
 
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCE
cscpconf
 
DATA WAREHOUSING
Rishikese MR
 
Data warehousing and online analytical processing
VijayasankariS
 
Data warehouse
_123_
 
Data Warehouse and Architecture, OLAP Operation
ShivarkarSandip
 
UNIT 1- Data Warehouse.pdf
Nancykumari47
 
20IT501_DWDM_PPT_Unit_I.ppt
SumathiG8
 
Business Intelligence Module 3_Datawarehousing.pptx
AmbikaVenkatesh4
 
Data Warehouse by Amr Ali
Amr Ali
 
Chapter29.ppt
VarchasvaTiwari2
 
turban_dss9e_ch08-DataWarehouse-Published.pdf
ikachanz
 
Dataware housing
work
 

Recently uploaded (20)

PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
deep dive data management sharepoint apps.ppt
novaprofk
 
Climate Action.pptx action plan for climate
justfortalabat
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Ad

3._DWH_Architecture__Components.ppt

  • 3. 3 Agenda  Data Warehouse architecture & building blocks  ER modeling review  Need for Dimensional Modeling  Dimensional modeling & its inside  Comparison of ER with dimensional
  • 5. 5 Components  Major components  Source data component  Data staging component  Information delivery component  Metadata component  Management and control component
  • 6. 6 1. Source Data Components  Source data can be grouped into 4 components  Production data  Comes from operational systems of enterprise  Some segments are selected from it  Narrow scope, e.g. order details  Internal data  Private datasheet, documents, customer profiles etc.  E.g. Customer profiles for specific offering  Special strategies to transform ‘it’ to DW (text document)  Archived data  Old data is archived  DW have snapshots of historical data  External data  Executives depend upon external sources  E.g. market data of competitors, car rental require new manufacturing. Define conversion
  • 8. 8 2. Data Staging Components  After data is extracted, data is to be prepared  Data extracted from sources needs to be changed, converted and made ready in suitable format  Three major functions to make data ready  Extract  Transform  Load  Staging area provides a place and area with a set of functions to  Clean  Change  Combine  Convert
  • 10. 10 3. Data Storage Components  Separate repository  Data structured for efficient processing  Redundancy is increased  Updated after specific periods  Only read-only
  • 12. 12 4. Information Delivery Component  Authentication issues  Active monitoring services Performance, DBA note selected aggregates to change storage User performance Aggregate awareness E.g. mining, OLAP etc
  • 16. 16 Background (ER Modeling)  For ER modeling, entities are collected from the environment  Each entity act as a table  Success reasons  Normalized after ER, since it removes redundancy (to handle update/delete anomalies)  But number of tables is increased  Is useful for fast access of small amount of data
  • 17. ER Drawbacks for DW / Need of Dimensional Modeling  ER Hard to remember, due to increased number of tables  Complex for queries with multiple tables (table joins)  Conventional RDBMS optimized for small number of tables whereas large number of tables might be required in DW  Ideally no calculated attributes  The DW does not require to update data like in OLTP system so there is no need of normalization  OLAP is not the only purpose of DW, we need a model that facilitate integration of data, data mining, historically consolidated data.  Efficient indexing scheme to avoid screening of all data  De-Normalization (in DW)  Add primary key  Direct relationships  Re-introduce redundancy 17
  • 18. 18 Dimensional Modeling  Dimensional Modeling focuses subject- orientation, critical factors of business  Critical factors are stored in facts  Redundancy is no problem, achieve efficiency  Logical design technique for high performance  Is the modeling technique for storage
  • 19. Dimensional Modeling (cont.)  Two important concepts Fact  Numeric measurements, represent business activity/event  Are pre-computed, redundant  Example: Profit, quantity sold Dimension  Qualifying characteristics, perspective to a fact  Example: date (Date, month, quarter, year) 19
  • 20. 20 Dimensional Modeling (cont.)  Facts are stored in fact table  Dimensions are represented by dimension tables  Dimensions are degrees in which facts can be judged  Each fact is surrounded by dimension tables  Looks like a star so called Star Schema
  • 22. 22 Inside Dimensional Modeling  Inside Dimension table  Key attribute of dimension table, for identification  Large no of columns, wide table  Non-calculated attributes, textual attributes  Attributes are not directly related  Un-normalized in Star schema  Ability to drill-down and drill-up are two ways of exploiting dimensions  Can have multiple hierarchies  Relatively small number of records
  • 23. 23 Inside Dimensional Modeling  Have two types of attributes  Key attributes, for connections  Facts  Inside fact table  Concatenated key  Grain or level of data identified  Large number of records  Limited attributes  Sparse data set  Degenerate dimensions (order number Average products per order)  Fact-less fact table
  • 24. 24 Star Schema Keys  Primary keys  Identifying attribute in dimension table  Relationship attributes combine together to form P.K  Surrogate keys  Replacement of primary key  System generated  Foreign keys  Collection of primary keys of dimension tables  Primary key to fact table  System generated  Collection of P.Ks
  • 25. 25 Advantage of Star Schema  Ease for users to understand  Optimized for navigation (less joins fast)  Most suitable for query processing Karen Corral, et al. (2006) The impact of alternative diagrams on the accuracy of recall: A comparison of star-schema diagrams and entity-relationship diagrams, Decision Support Systems, 42(1), 450-468.
  • 27. DATA WAREHOUSES AND DATA MARTS  Bill Inmon stated, “The single most important issue facing the IT manager this year is whether to build the data warehouse first or the data mart first.”  This statement is true even today. Let us examine this statement and take a stand  Before deciding to build a data warehouse for your organization, you need to ask the  Following basic and fundamental questions and address the relevant issues:  Top-down or bottom-up approach?  Enterprise-wide or departmental?  Which first—data warehouse or data mart?  Build pilot or go with a full-fledged implementation?  Dependent or independent data marts?
  • 29. Top Down Versus Bottom Approach
  • 31. A Practical Approach  In order to formulate an approach for your organization, you need to examine what exactly  Your organization wants. Is your organization looking for long-term results or fast data  Marts for only a few subjects for now? Does your organization want quick, proof-of-concept,  Throw-away implementations? Or, do you want to look into some other practical approach?
  • 32.  Although both the top-down and the bottom-up approaches each have their own advantages and drawbacks, a compromise approach accommodating both views appears to be practical.  The chief proponent of this practical approach is Ralph Kimball, an eminent author and data warehouse expert. The steps in this practical approach are as follows: 1. Plan and define requirements at the overall corporate level 2. Create a surrounding architecture for a complete warehouse 3. Conform and standardize the data content 4. Implement the data warehouse as a series of supermarts, one at a time
  • 33. METADATA IN THE DATA WAREHOUSE Types of Metadata  Metadata in a data warehouse fall into three major categories:  Operational Metadata  Extraction and Transformation Metadata  End-User Metadata
  • 34. Operational Metadata  As you know, data for the data warehouse comes from several operational systems of the enterprise. These source systems contain different data structures.  The data elements selected for the data warehouse have various field lengths and data types.  In selecting data from the source systems for the data warehouse, you split records, combine parts of records from different source files, and deal with multiple coding schemes and field lengths.  When you deliver information to the end-users, you must be able to tie that back to the original source data sets.  Operational metadata contain all of this information about the operational data sources.
  • 35. Extraction and Transformation Metadata  Extraction and transformation metadata contain data about the extraction of data from the source systems, namely, the extraction frequencies, extraction methods, and business rules for the data extraction.  Also, this category of metadata contains information about all the data transformations that take place in the data staging area. End-User Metadata  The end-user metadata is the navigational map of the data warehouse.  It enables the end-users to find information from the data warehouse.  The end-user metadata allows the end-users to use their own business terminologies.
  • 36. Significance  Why is metadata especially important in a data warehouse?  First, it acts as the glue that connects all parts of the data warehouse.  Next, it provides information about the contents and structures to the developers.  Finally, it opens the door to the end-users and makes the contents recognizable in their own terms.
  • 37. Exercise  A data warehouse is subject-oriented. What would be the major critical business subjects for the following companies?  An international manufacturing company  A local community bank  A domestic hotel chain  You are the data analyst on the project team building a data warehouse for an insurance company. List the possible data sources from which you will bring the data into your data warehouse. State your assumptions.  For an airlines company, identify three operational applications that would feed into the data warehouse. What would be the data load and refresh cycles?  Prepare a table showing all the potential users and informationdelivery methods for a data warehouse supporting a large national grocery chain. Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall Slide 2-37