SlideShare a Scribd company logo
10/6/2011LearnDataVault.com1
Data Vault Modeling MethodologyA Primer…© Dan Linstedt 2009-2012All Rights Reservedhttp://LearnDataVault.com
A bit about me…3Author, Inventor, Speaker – and part time photographer…25+ years in the IT industryWorked in DoD, US Gov’t, Fortune 50, and so on…Find out more about the Data Vault:http://www.youtube.com/LearnDataVaulthttp://LearnDataVault.comFull profile on http://www.LinkedIn.com/dlinstedtLearnDataVault.com
What IS a Data Vault? (Business Definition)Data Vault ModelDetail orientedHistorical traceabilityUniquely linked set of normalized tablesSupports one or more functional areas of business10/6/2011LearnDataVault.com4Data Vault Methodology
CMMI Level 5 Project Plan
Risk, Governance, Versioning
Peer Reviews, Release Cycles
Repeatable, Consistent, Optimized
Complete with Best Practices for BI/DWBusiness KeysSpan  / CrossLines of BusinessSalesContractsPlanningDeliveryFinanceOperationsProcurementFunctional Area
What Does One Look Like?10/6/2011LearnDataVault.com5Records a history of the interactionCustomerProductSatSatSatSatSatLinkCustomerProductF(x)F(x)F(x)SatSatSatSatOrderF(x)SatOrderElements:Hub
Link
SatelliteHub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data
Who’s Using It?10/6/2011LearnDataVault.com6
The PAIN!!Issues in Current EDW Projects10/6/2011LearnDataVault.com7
EDW Architecture: Generation 110/6/2011LearnDataVault.com8Enterprise BI Solution(batch)SalesStaging(EDW)StarSchemasComplex Business RulesFinanceConformed DimensionsJunk TablesHelper TablesFactless FactsStaging + HistoryContractsComplex Business Rules+Dependencies
Kick-Starting Data WarehousingHR Asks IT to build the FIRST Data Warehouse / Prototype10/6/2011LearnDataVault.com91.2.IT Says…  OK:  $125k and 90 days…3.HR Says:Great!  Get Started
Everyone’s Happy!IT Delivers. On-Time & In Budget!10/6/2011LearnDataVault.com104.5.HR Says:Thank-you!  We’re Happy!First Star!Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACTCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type
So Where’s the PAIN?10/6/2011LearnDataVault.com11
The PAIN is RIGHT HERE!!Contracts Sees Success, wants the same for their systems.10/6/2011LearnDataVault.com121.2.IT Says…  Ok, but… It won’t be  $125k and 90 days…Because we have to “merge it” with HR” it will be $250 and 180 days.3.Contracts Says:Ouch!  That’s not reasonable, but we need it, so go ahead…
And HERE….10/6/2011LearnDataVault.com13Finance, Sales, and Marketing want in….IT Says…  Ok, but… It won’t be  $250k and 90 days…  Because we have to “merge it” with HR and Contracts it will be $350k and 250 days.And this continues….Business Says...“Can’t you just make-a-copy of the Star Schema, and give me my own for cheaper & less time?
Silo Building / IT Non-Agility10/6/2011LearnDataVault.com14First StarSALESWe built our own because IT costs too muchFINANCEWe built our own because IT took too longMARKETINGWe built our own because we need customized dimension dataWhy is this happening?  What’s Causing this Problem?
Root Cause of Pain: Re-Engineering!10/6/2011LearnDataVault.com15IT is forced to Re-EngineerETL loading code + SQL BI Queries WHENEVER:WHENEVER table structures change
New systems are introduced1. Adding fields to DimensionsBusiness Rules Change
(causing ETL Loading to change, and forcing Engineers to RELOAD existing data)Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACTCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type3. Adding Dimensions to Facts2. Adding fields to Facts
Why Re-Engineering?10/6/2011LearnDataVault.com16Adding fields to a conformed dimension….Adding fields to a shared fact….Changing code to match new business rules…Require adding/changingFields in target tables!Require Re-Engineering!
Other Pains?10/6/2011LearnDataVault.com17Dimension-Itis?IT – Non-Agility?Deformed Dimensions?What about the “data” you don’t see?What about the “BAD” data left in the source systems?
The SolutionGo the Data Vault Route!10/6/2011LearnDataVault.com18
EDW Architecture: Generation 210/6/2011LearnDataVault.com19SOAEnterprise BI SolutionStarSchemas(real-time)Sales(batch)DVEDW(batch)StagingErrorMartsFinanceContractsReportCollectionsBusiness Rules Downstream!(the Lens Filter)
Unstructured Data And Data Vault10/6/2011LearnDataVault.com20Unstructured Data SetsOntologies/TaxonomiesUnstructured Processing EngineEmail
Docs
Images
Movies
SoundOn-DemandCubesJoins through LINK StructuresData Vault EDW
IT Agility10/6/2011LearnDataVault.com21RAW“what-is”StarSchemasComplexBusiness RulesETL-TData Vault(EDW)SourceStagingBusinessDrivenStarSchemas2. Business Gap AnalysisUnknown Time…
Business Requirements
Start new phase1. Fast Load & Fast Integration3. IT Implementation of Business Rules
What are the Facts Jack?10/6/2011LearnDataVault.com22Generation 1 EDW’s tried to provide“One version of the truth”Generation 2 (Data Vaults) provide…“One version of the facts, for each point in time.”
Business Gap Analysis10/6/2011LearnDataVault.com23The Way Business Perceives it’s business to be runningGapAnalysisOperationalReportsGapAnalysisDynamicCubes(Data Marts)The way the source systems see the business running.
Secured/Protected Information Systems10/6/2011LearnDataVault.com24Non-Classified DVClassified Data VaultHubSatHubData CopyLinkLinkSatSatSatModel CopySatHubHubLinkHubSatSatSatSatSatSatSatSatYellow = New TablesModel changes are absorbed seamlessly into the classified system
Classified world can add all their own structures while maintaining congruence with standard unclassified Data VaultExtensibility Factor10/6/2011LearnDataVault.com25New AdditionsNew CodeBilledAmountsProduct ShippedDatesProductQuantitiesExisting EDWNo Impact!ProductSupplierLinkSuppliersProductsDescriptionsDescriptionsAddressAvailability DatesStock QuantitiesStock QuantitiesDefect ReasonsRating Score
Where’s the Solution?10/6/2011LearnDataVault.com26Re-EngineeringHandle Changes Wherever… Whenever…  with EASE!
The Three vehicles…Pros and Cons of the Modeling Methodologies10/6/2011LearnDataVault.com27
3rd Normal Form Pros/Cons as an EDWPROS (as 3NF)Many to many linkagesHandle lots of informationTightly integrated informationHighly structuredConducive to near-real time loadsRelatively easy to extend10/6/2011LearnDataVault.com28CONS (as EDW)Time driven PK issuesParent-child complexitiesCascading change impactsDifficult to loadNot conducive to BI toolsNot conducive to drill-downDifficult to architect for an enterpriseNot conducive to spiral/scope controlled implementationPhysical design usually doesn’t follow business processes
Star Schema Pros/Cons as an EDWPROS (as Data Mart)Good for multi-dimensional analysisSubject oriented answersExcellent for aggregation pointsRapid development / deploymentGreat for some historical storage10/6/2011LearnDataVault.com29CONS (as EDW)Not cross-business functionalUse of junk / helper tablesTrouble with VLDWUnable to provide integrated enterprise informationCan’t handle ODS or exploration warehouse requirementsTrouble with data explosion in near-real-time environmentsTrouble with updates to type 2 dimension primary keysTrouble with late arriving data in dimensions to support real-time arriving transactionsNot granular enough information to support real-time data integration
Data Vault Pros/Cons as an EDWPROS (as EDW)Supports near-real time and batch feedsSupports functional business linkingExtensible / flexibleProvides rapid build / delivery of star schema’sSupports VLDB / VLDWDesigned for EDWSupports data mining and AIProvides granular detailIncrementally built10/6/2011LearnDataVault.com30CONS (as EDW)Not conducive to OLAP processingRequires business analysis to be firmIntroduces many join operations
The Three Vehicles…Which would you use to win a race?Which would you use to move a house?Would you adapt the truck and enter a race with Porches and expect to win?10/6/2011LearnDataVault.com31
#1 complaint about DV architectureSo you want to deal with Joins do you?10/6/2011LearnDataVault.com32
Joins, Everywhere!10/6/2011LearnDataVault.com33Yes, the DV is full of joins but…These are highly normalized tables (thin & Narrow), reducing I/O’s to read large numbers of rows, at high speed, in parallel.  Joins occur in RAM instead of on disk.  The Optimizer is given a chance to “drop tables” from the join that aren’t necessary.When Parallelism is too much…Not enough CPU or RAM to handle the extra work-load
Not enough rows being queried, (the overhead of starting the threads takes longer than an original scan.End Result?  The DV Scales to the Petabyte Levels when necessary…
Mathematics Behind the Data Vault Model*** The Data Vault is BACKED by Mathematical Principles***Parallel versus sequential execution modelsSet LogicI/O Bandwidth & ThroughputCompression (for query performance gains)Process Repeatability (tuning & predictability measurements)RAM versus electromagnetic disk (Solid-State Drives are not measured)http://osl.cs.uiuc.edu/docs/IPDPS-TR04/TCA_TR04.pdf10/6/2011LearnDataVault.com34
Know when to hold ‘em, know when to fold ‘emWhen to use DV, and when not…10/6/2011LearnDataVault.com35
The Challenger….10/6/2011LearnDataVault.com36The challenger says:My system works fine, why should I use the Data Vault?
I don’t have volume problems…
I don’t have compliance/auditability problems…
I don’t have real-time problems…
My system produces matching results across lines of business…
I’ve never had to “re-state” the data in the warehouse…
I can still build new marts, and conform dimensions in 30 days or less…
My business doesn’t acquire new systems often (if ever)

More Related Content

PPTX
Data vault: What's Next
Empowered Holdings, LLC
 
PPTX
Data Vault and DW2.0
Empowered Holdings, LLC
 
PPTX
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
PPTX
Operational Data Vault
Empowered Holdings, LLC
 
PPTX
Data Vault Overview
Empowered Holdings, LLC
 
PDF
Data vault modeling et retour d'expĂŠrience
Swiss Data Forum Swiss Data Forum
 
PDF
Why Data Vault?
Kent Graziano
 
DOCX
Data Vault: Data Warehouse Design Goes Agile
Daniel Upton
 
Data vault: What's Next
Empowered Holdings, LLC
 
Data Vault and DW2.0
Empowered Holdings, LLC
 
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
Operational Data Vault
Empowered Holdings, LLC
 
Data Vault Overview
Empowered Holdings, LLC
 
Data vault modeling et retour d'expĂŠrience
Swiss Data Forum Swiss Data Forum
 
Why Data Vault?
Kent Graziano
 
Data Vault: Data Warehouse Design Goes Agile
Daniel Upton
 

What's hot (20)

PDF
Data Vault Introduction
Patrick Van Renterghem
 
PPTX
Data vault what's Next: Part 2
Empowered Holdings, LLC
 
PPTX
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Kent Graziano
 
PPTX
Agile Data Mining with Data Vault 2.0 (english)
Michael Olschimke
 
PDF
Lean Data Warehouse via Data Vault
Daniel Upton
 
PPTX
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
PDF
Shorter time to insight more adaptable less costly bi with end to end modelst...
Daniel Upton
 
PDF
Agile BI via Data Vault and Modelstorming
Daniel Upton
 
PPT
Best Practices: Data Admin & Data Management
Empowered Holdings, LLC
 
PPTX
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
Capgemini
 
PDF
Guru4Pro Data Vault Best Practices
CGI
 
PPTX
Original: Lean Data Model Storming for the Agile Enterprise
Daniel Upton
 
PDF
Data Warehouse Design and Best Practices
Ivo Andreev
 
PDF
Data Warehouse Agility Array Conference2011
Hans Hultgren
 
PPTX
Visual Data Vault
Michael Olschimke
 
PPTX
Conceptional Data Vault
Torsten Glunde
 
PDF
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
Kent Graziano
 
PDF
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland Bouman
 
PDF
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Edureka!
 
PPT
Data warehouse 101-fundamentals-
AshishGuleria
 
Data Vault Introduction
Patrick Van Renterghem
 
Data vault what's Next: Part 2
Empowered Holdings, LLC
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Kent Graziano
 
Agile Data Mining with Data Vault 2.0 (english)
Michael Olschimke
 
Lean Data Warehouse via Data Vault
Daniel Upton
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Daniel Upton
 
Agile BI via Data Vault and Modelstorming
Daniel Upton
 
Best Practices: Data Admin & Data Management
Empowered Holdings, LLC
 
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
Capgemini
 
Guru4Pro Data Vault Best Practices
CGI
 
Original: Lean Data Model Storming for the Agile Enterprise
Daniel Upton
 
Data Warehouse Design and Best Practices
Ivo Andreev
 
Data Warehouse Agility Array Conference2011
Hans Hultgren
 
Visual Data Vault
Michael Olschimke
 
Conceptional Data Vault
Torsten Glunde
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
Kent Graziano
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland Bouman
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Edureka!
 
Data warehouse 101-fundamentals-
AshishGuleria
 
Ad

Similar to IRM UK - 2009: DV Modeling And Methodology (20)

PDF
Data Vault 2.0 Demystified: East Coast Tour
WhereScape
 
DOCX
Data Vault: What is it? Where does it fit? SQL Saturday #249
Daniel Upton
 
PDF
Introduction to data vault ilja dmitrijev
Ilja Dmitrijevs
 
PDF
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...
IRJET Journal
 
PDF
Data Warehousing 2016
Kent Graziano
 
PDF
Is it sensible to use Data Vault at all? Conclusions from a project.
Capgemini
 
PPT
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Kent Graziano
 
PDF
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Dean Hallman
 
PDF
Why Data Vault?
TESCHGlobal
 
PDF
BI Architecture in support of data quality
Tom Breur
 
PDF
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
 
PDF
Meetup 25/04/19: Big Data
Digipolis Antwerpen
 
PDF
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Patrick Van Renterghem
 
PPT
Data Warehouses: A Whistle-Stop Tour
Cade Roux
 
PPTX
How to become a certified data vault data modeler #CDVDM
Erik Fransen
 
PDF
Data Architecture A Primer for the Data Scientist A Primer for the Data Scien...
grendaoltian
 
PPTX
Data Vault 2.0: Big Data Meets Data Warehousing
All Things Open
 
PPTX
DataWarehouse Architecture,daat mining,data mart,etl process.pptx
ArunPatrick2
 
PPTX
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
WhereScape
 
PPTX
Data vault seminar May 5-6 Dommel - The factory and the workshop
johannesvdb
 
Data Vault 2.0 Demystified: East Coast Tour
WhereScape
 
Data Vault: What is it? Where does it fit? SQL Saturday #249
Daniel Upton
 
Introduction to data vault ilja dmitrijev
Ilja Dmitrijevs
 
Evaluation of Data Auditability, Traceability and Agility leveraging Data Vau...
IRJET Journal
 
Data Warehousing 2016
Kent Graziano
 
Is it sensible to use Data Vault at all? Conclusions from a project.
Capgemini
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Kent Graziano
 
Big Data or Data Warehousing? How to Leverage Both in the Enterprise
Dean Hallman
 
Why Data Vault?
TESCHGlobal
 
BI Architecture in support of data quality
Tom Breur
 
BI Chapter 03.pdf business business business business business business
JawaherAlbaddawi
 
Meetup 25/04/19: Big Data
Digipolis Antwerpen
 
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Patrick Van Renterghem
 
Data Warehouses: A Whistle-Stop Tour
Cade Roux
 
How to become a certified data vault data modeler #CDVDM
Erik Fransen
 
Data Architecture A Primer for the Data Scientist A Primer for the Data Scien...
grendaoltian
 
Data Vault 2.0: Big Data Meets Data Warehousing
All Things Open
 
DataWarehouse Architecture,daat mining,data mart,etl process.pptx
ArunPatrick2
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
WhereScape
 
Data vault seminar May 5-6 Dommel - The factory and the workshop
johannesvdb
 
Ad

Recently uploaded (20)

PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 

IRM UK - 2009: DV Modeling And Methodology

  • 2. Data Vault Modeling MethodologyA Primer…© Dan Linstedt 2009-2012All Rights Reservedhttp://LearnDataVault.com
  • 3. A bit about me…3Author, Inventor, Speaker – and part time photographer…25+ years in the IT industryWorked in DoD, US Gov’t, Fortune 50, and so on…Find out more about the Data Vault:http://www.youtube.com/LearnDataVaulthttp://LearnDataVault.comFull profile on http://www.LinkedIn.com/dlinstedtLearnDataVault.com
  • 4. What IS a Data Vault? (Business Definition)Data Vault ModelDetail orientedHistorical traceabilityUniquely linked set of normalized tablesSupports one or more functional areas of business10/6/2011LearnDataVault.com4Data Vault Methodology
  • 5. CMMI Level 5 Project Plan
  • 9. Complete with Best Practices for BI/DWBusiness KeysSpan / CrossLines of BusinessSalesContractsPlanningDeliveryFinanceOperationsProcurementFunctional Area
  • 10. What Does One Look Like?10/6/2011LearnDataVault.com5Records a history of the interactionCustomerProductSatSatSatSatSatLinkCustomerProductF(x)F(x)F(x)SatSatSatSatOrderF(x)SatOrderElements:Hub
  • 11. Link
  • 12. SatelliteHub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data
  • 14. The PAIN!!Issues in Current EDW Projects10/6/2011LearnDataVault.com7
  • 15. EDW Architecture: Generation 110/6/2011LearnDataVault.com8Enterprise BI Solution(batch)SalesStaging(EDW)StarSchemasComplex Business RulesFinanceConformed DimensionsJunk TablesHelper TablesFactless FactsStaging + HistoryContractsComplex Business Rules+Dependencies
  • 16. Kick-Starting Data WarehousingHR Asks IT to build the FIRST Data Warehouse / Prototype10/6/2011LearnDataVault.com91.2.IT Says… OK: $125k and 90 days…3.HR Says:Great! Get Started
  • 17. Everyone’s Happy!IT Delivers. On-Time & In Budget!10/6/2011LearnDataVault.com104.5.HR Says:Thank-you! We’re Happy!First Star!Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACTCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type
  • 18. So Where’s the PAIN?10/6/2011LearnDataVault.com11
  • 19. The PAIN is RIGHT HERE!!Contracts Sees Success, wants the same for their systems.10/6/2011LearnDataVault.com121.2.IT Says… Ok, but… It won’t be $125k and 90 days…Because we have to “merge it” with HR” it will be $250 and 180 days.3.Contracts Says:Ouch! That’s not reasonable, but we need it, so go ahead…
  • 20. And HERE….10/6/2011LearnDataVault.com13Finance, Sales, and Marketing want in….IT Says… Ok, but… It won’t be $250k and 90 days… Because we have to “merge it” with HR and Contracts it will be $350k and 250 days.And this continues….Business Says...“Can’t you just make-a-copy of the Star Schema, and give me my own for cheaper & less time?
  • 21. Silo Building / IT Non-Agility10/6/2011LearnDataVault.com14First StarSALESWe built our own because IT costs too muchFINANCEWe built our own because IT took too longMARKETINGWe built our own because we need customized dimension dataWhy is this happening? What’s Causing this Problem?
  • 22. Root Cause of Pain: Re-Engineering!10/6/2011LearnDataVault.com15IT is forced to Re-EngineerETL loading code + SQL BI Queries WHENEVER:WHENEVER table structures change
  • 23. New systems are introduced1. Adding fields to DimensionsBusiness Rules Change
  • 24. (causing ETL Loading to change, and forcing Engineers to RELOAD existing data)Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACTCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_TypeCustomer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type3. Adding Dimensions to Facts2. Adding fields to Facts
  • 25. Why Re-Engineering?10/6/2011LearnDataVault.com16Adding fields to a conformed dimension….Adding fields to a shared fact….Changing code to match new business rules…Require adding/changingFields in target tables!Require Re-Engineering!
  • 26. Other Pains?10/6/2011LearnDataVault.com17Dimension-Itis?IT – Non-Agility?Deformed Dimensions?What about the “data” you don’t see?What about the “BAD” data left in the source systems?
  • 27. The SolutionGo the Data Vault Route!10/6/2011LearnDataVault.com18
  • 28. EDW Architecture: Generation 210/6/2011LearnDataVault.com19SOAEnterprise BI SolutionStarSchemas(real-time)Sales(batch)DVEDW(batch)StagingErrorMartsFinanceContractsReportCollectionsBusiness Rules Downstream!(the Lens Filter)
  • 29. Unstructured Data And Data Vault10/6/2011LearnDataVault.com20Unstructured Data SetsOntologies/TaxonomiesUnstructured Processing EngineEmail
  • 30. Docs
  • 33. SoundOn-DemandCubesJoins through LINK StructuresData Vault EDW
  • 34. IT Agility10/6/2011LearnDataVault.com21RAW“what-is”StarSchemasComplexBusiness RulesETL-TData Vault(EDW)SourceStagingBusinessDrivenStarSchemas2. Business Gap AnalysisUnknown Time…
  • 36. Start new phase1. Fast Load & Fast Integration3. IT Implementation of Business Rules
  • 37. What are the Facts Jack?10/6/2011LearnDataVault.com22Generation 1 EDW’s tried to provide“One version of the truth”Generation 2 (Data Vaults) provide…“One version of the facts, for each point in time.”
  • 38. Business Gap Analysis10/6/2011LearnDataVault.com23The Way Business Perceives it’s business to be runningGapAnalysisOperationalReportsGapAnalysisDynamicCubes(Data Marts)The way the source systems see the business running.
  • 39. Secured/Protected Information Systems10/6/2011LearnDataVault.com24Non-Classified DVClassified Data VaultHubSatHubData CopyLinkLinkSatSatSatModel CopySatHubHubLinkHubSatSatSatSatSatSatSatSatYellow = New TablesModel changes are absorbed seamlessly into the classified system
  • 40. Classified world can add all their own structures while maintaining congruence with standard unclassified Data VaultExtensibility Factor10/6/2011LearnDataVault.com25New AdditionsNew CodeBilledAmountsProduct ShippedDatesProductQuantitiesExisting EDWNo Impact!ProductSupplierLinkSuppliersProductsDescriptionsDescriptionsAddressAvailability DatesStock QuantitiesStock QuantitiesDefect ReasonsRating Score
  • 41. Where’s the Solution?10/6/2011LearnDataVault.com26Re-EngineeringHandle Changes Wherever… Whenever… with EASE!
  • 42. The Three vehicles…Pros and Cons of the Modeling Methodologies10/6/2011LearnDataVault.com27
  • 43. 3rd Normal Form Pros/Cons as an EDWPROS (as 3NF)Many to many linkagesHandle lots of informationTightly integrated informationHighly structuredConducive to near-real time loadsRelatively easy to extend10/6/2011LearnDataVault.com28CONS (as EDW)Time driven PK issuesParent-child complexitiesCascading change impactsDifficult to loadNot conducive to BI toolsNot conducive to drill-downDifficult to architect for an enterpriseNot conducive to spiral/scope controlled implementationPhysical design usually doesn’t follow business processes
  • 44. Star Schema Pros/Cons as an EDWPROS (as Data Mart)Good for multi-dimensional analysisSubject oriented answersExcellent for aggregation pointsRapid development / deploymentGreat for some historical storage10/6/2011LearnDataVault.com29CONS (as EDW)Not cross-business functionalUse of junk / helper tablesTrouble with VLDWUnable to provide integrated enterprise informationCan’t handle ODS or exploration warehouse requirementsTrouble with data explosion in near-real-time environmentsTrouble with updates to type 2 dimension primary keysTrouble with late arriving data in dimensions to support real-time arriving transactionsNot granular enough information to support real-time data integration
  • 45. Data Vault Pros/Cons as an EDWPROS (as EDW)Supports near-real time and batch feedsSupports functional business linkingExtensible / flexibleProvides rapid build / delivery of star schema’sSupports VLDB / VLDWDesigned for EDWSupports data mining and AIProvides granular detailIncrementally built10/6/2011LearnDataVault.com30CONS (as EDW)Not conducive to OLAP processingRequires business analysis to be firmIntroduces many join operations
  • 46. The Three Vehicles…Which would you use to win a race?Which would you use to move a house?Would you adapt the truck and enter a race with Porches and expect to win?10/6/2011LearnDataVault.com31
  • 47. #1 complaint about DV architectureSo you want to deal with Joins do you?10/6/2011LearnDataVault.com32
  • 48. Joins, Everywhere!10/6/2011LearnDataVault.com33Yes, the DV is full of joins but…These are highly normalized tables (thin & Narrow), reducing I/O’s to read large numbers of rows, at high speed, in parallel. Joins occur in RAM instead of on disk. The Optimizer is given a chance to “drop tables” from the join that aren’t necessary.When Parallelism is too much…Not enough CPU or RAM to handle the extra work-load
  • 49. Not enough rows being queried, (the overhead of starting the threads takes longer than an original scan.End Result? The DV Scales to the Petabyte Levels when necessary…
  • 50. Mathematics Behind the Data Vault Model*** The Data Vault is BACKED by Mathematical Principles***Parallel versus sequential execution modelsSet LogicI/O Bandwidth & ThroughputCompression (for query performance gains)Process Repeatability (tuning & predictability measurements)RAM versus electromagnetic disk (Solid-State Drives are not measured)http://osl.cs.uiuc.edu/docs/IPDPS-TR04/TCA_TR04.pdf10/6/2011LearnDataVault.com34
  • 51. Know when to hold ‘em, know when to fold ‘emWhen to use DV, and when not…10/6/2011LearnDataVault.com35
  • 52. The Challenger….10/6/2011LearnDataVault.com36The challenger says:My system works fine, why should I use the Data Vault?
  • 53. I don’t have volume problems…
  • 54. I don’t have compliance/auditability problems…
  • 55. I don’t have real-time problems…
  • 56. My system produces matching results across lines of business…
  • 57. I’ve never had to “re-state” the data in the warehouse…
  • 58. I can still build new marts, and conform dimensions in 30 days or less…
  • 59. My business doesn’t acquire new systems often (if ever)
  • 60. My incoming data sets don’t changeI Say…That’s wonderful, don’t fix what’s broken. Have a nice day, oh- but call me when or if you ever run into these problems…
  • 61. When to Apply the Data Vault10/6/2011LearnDataVault.com37Benefits:Scalability
  • 65. IT and Business Accountability
  • 70. Successful EDW ImplementationsHow to build a data vaultIn 10 easy steps…10/6/2011LearnDataVault.com38
  • 71. Step 110/6/2011LearnDataVault.com39Identify your business processes, followed by your business keys (that are used to identify the data that flows through the business processes)** NOTE: Along the way, document your assumptions, document your reasons for choosing keys, and modeling designs, develop a list of questions to be answered by business users…
  • 72. Step 210/6/2011LearnDataVault.com40Identify the issues/problems that might be carried with the identified business keys, annotate the risks, and mitigate each one.
  • 73. Step 310/6/2011LearnDataVault.com41Identify the units of work, the associations – LINK tables, where keys combine to form a notion, a concept, and a relationship.
  • 74. Step 410/6/2011LearnDataVault.com42Identify the descriptive data that belongs to SINGLE Hub Keys, ensure that the data doesn’t represent or rely on a relationship.
  • 75. Step 510/6/2011LearnDataVault.com43Identify the Satellite data that depends on relationships – move it to the appropriate LINK table.HINT: If you “want” to put a Foreign Key in a Satellite, you have a clear sign that the Satellite is in the WRONG place, and needs to be assigned to a LINK table rather than a HUB.
  • 76. Step 610/6/2011LearnDataVault.com44Scope the Model Down to a managable chunk. Implement the first two Hubs, Hub Satellites, and first Link. BUILD IN INCREMENTS!
  • 77. Step 710/6/2011LearnDataVault.com45Setup the key generation load routines, setup the staging area, and begin loading data.
  • 78. Step 810/6/2011LearnDataVault.com46Review any “truncation” errors, or any data-type conversion problems, fix the staging area, and remove duplicates.
  • 79. Step 910/6/2011LearnDataVault.com47Begin Loading the Data Vault. Load all Hubs, then all Hub Satellites, Then all Links, and finish with All Link Satellites.
  • 80. Step 1010/6/2011LearnDataVault.com48Reconcile the Data Vault to the source system, then build a first data mart from the results. Bring business value FAST!
  • 82. 10 minutes to find the Hubs….10/6/2011LearnDataVault.com50
  • 83. Possible Hubs From Northwind10/6/2011LearnDataVault.com51
  • 84. 10 Minutes to find the Links…10/6/2011LearnDataVault.com52
  • 85. Possible Links From Northwind10/6/2011LearnDataVault.com53
  • 86. 10 minutes to find the Satellites…10/6/2011LearnDataVault.com54
  • 87. Possible Satellites From Northwind10/6/2011LearnDataVault.com55
  • 88. What did we learn?We often deal with more than 1 system at a time… this was a lab with only one model.We didn’t have any business requirements that we might need to answer questions, but doesn’t that reflect real-life?The data set is extremely dirty (you never have that in your systems right?)Time Zone based data can be a problemLack of metadata causes integration issues and modeling decisions10/6/2011LearnDataVault.com56
  • 89. The Experts Say…“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon“The Data Vault is foundationally strong and exceptionally scalable architecture.”Stephen Brobst“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney57
  • 90. More Notables…“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”Scott Ambler58
  • 91. Where To Learn MoreThe Technical Modeling Book: http://LearnDataVault.comThe Discussion Forums: & eventshttp://LinkedIn.com – Data Vault DiscussionsContact me:http://DanLinstedt.com - web [email protected] - emailWorld wide User Group (Free)http://dvusergroup.com59