SlideShare a Scribd company logo
The Application of Data Vault to DW2.0© Dan Linstedt, 2011-2012 all rights reserved
A bit about me…2Author, Inventor, Speaker – and part time photographer…25+ years in the IT industryWorked in DoD, US Gov’t, Fortune 50, and so on…Find out more about the Data Vault:http://www.youtube.com/LearnDataVaulthttp://LearnDataVault.comFull profile on http://www.LinkedIn.com/dlinstedt
AgendaDefining The Needs for the Data VaultDW2.0 ArchitectureDW2.0 Drivers for Data ModelingDivergence of Data Models over TimeData Vault in DW2.0Defining the Data VaultWhat does one look like?Modeling in DW2.0Applying Data Vault to Global DW2.0Applying Data Vault to Time-Value DW2.0Compliance in DW2.0Applying Data Vault to System of RecordThe Paradox of DW2.0Volume, Latency, Complexity,Normalization andTransformation ability10/5/2011Do Not Duplicate Without Written Permission3
DW2.0 Architecture10/5/2011Do Not Duplicate Without Written Permission4Enterprise Service BusESB Connectivity:EAI
EII
ETL / ELT
Web ServicesCube ProcessingTemporalIndexingSemanticManagementActive Data MiningTransformationActiveCleansingUnstructured Data:Email
Plain Text
Word Docs
ImagesMETADATAInteractiveTacticalData Models Must be consistently applied throughout all layers.IntegratedStrategicESB Management:Text
Email
Spread Sheets
Transaction
Structured InformationNear-LineExtendedArchivalHistoricalEnterprise Data Warehouse
DW2.0 Drivers for Data Modeling10/5/2011Do Not Duplicate Without Written Permission5Technical DriversBusiness DriversFlexibilityComplianceVolumeFrequencyDataModelDataModelUnderstandabilityGranularityData Models are one of the main integration points between Technical and Business drivers.Business Keys drive understandability, and granularityNormalization drives flexibility, and frequency of loadRaw data sets in the EDW/ADW drive compliance and volume
Divergence of Data Models over TimeData models (both logical and physical) have diverged from business drivers and direction over time.The Data Models have driven towards physical improvements instead of towards business improvements.The Data Vault Architecture drives data modeling back to the business sides of the house.10/5/2011Do Not Duplicate Without Written Permission6
AgendaDefining The Needs for the Data VaultDW2.0 ArchitectureDW2.0 Drivers for Data ModelingDivergence of Data Models over TimeData Vault in DW2.0Defining the Data VaultWhat does one look like?Modeling in DW2.0Applying Data Vault to Global DW2.0Applying Data Vault to Time-Value DW2.0Compliance in DW2.0Applying Data Vault to System of RecordThe Paradox of DW2.0Volume, Latency, Complexity,Normalization andTransformation ability10/5/2011Do Not Duplicate Without Written Permission7Image is from - What The Bleep Do We Know?
Defining the Data Vault10/5/2011Do Not Duplicate Without Written Permission8The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.Defining the Data VaultTDAN.com Article
What Does One Look Like?10/5/2011Do Not Duplicate Without Written Permission9Records a history of the interactionAccount InformationSatSatSatLinkAccountF(x)F(x)SatSatInvoiceIDSatF(x)SatInvoice / Billing InformationCustomer InformationSatElements:Hub
Link
SatelliteSatCustomerF(x)SatThe impact of linking disparate systems together, is inside the shaded area.
Modeling in DW2.0Bill Says:DW2.0 must be brought down to a very finite level of detail.The starting point for DW2.0 is the modeling process.The data model applies to the integrated sector, the near line sector, and the archival sector.The way that data warehouses are built is in an incremental mannerThe Data Vault specializes in:Providing finite grain at the lowest level possible,Mapping business process models to data modelsExisting in all sectors simultaneously without changes.Flexibility and managing change so that impacts are not a mile-wide and 10 miles deep.10/5/2011Do Not Duplicate Without Written Permission10
Elements in a Data VaultHubUnique List of Business Keys, tracked by the first time the warehouse saw them appear.LinkRelationships between business keys, also representing a grain shift, or a hierarchical roll-up.SatelliteData over time, granular, and descriptive about the business key.  Also setup according to type of information, and rate of change.10/5/2011Do Not Duplicate Without Written Permission11
Applying the Data Vault to Global DW2.010/5/2011Do Not Duplicate Without Written Permission12Manufacturing EDW in ChinaPlanning in BrazilHubHubLinkSatSatLinkSatSatLinkHubLinkHubHubSatSatSatSatSatSatSatSatBase EDW Created in CorporateFinancials in USA
Applying the Data Vault to Time-Value DW2.010/5/2011Do Not Duplicate Without Written Permission13Satellite Data Over TimeRow 1Row 2Row 3Row 4Satellite entities in the Data Vault house data over time.  They are split by type of information and rate of change.  This is an example set of data for a customer name satellite.

More Related Content

PPTX
Data Vault Overview
Empowered Holdings, LLC
 
PDF
Introduction to Data Vault Modeling
Kent Graziano
 
PPTX
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
PDF
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
PDF
Data Architecture for Solutions.pdf
Alan McSweeney
 
PPTX
Hive: Loading Data
Benjamin Leonhardi
 
PDF
Why Data Vault?
Kent Graziano
 
PPTX
Operational Data Vault
Empowered Holdings, LLC
 
Data Vault Overview
Empowered Holdings, LLC
 
Introduction to Data Vault Modeling
Kent Graziano
 
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
Data Architecture for Solutions.pdf
Alan McSweeney
 
Hive: Loading Data
Benjamin Leonhardi
 
Why Data Vault?
Kent Graziano
 
Operational Data Vault
Empowered Holdings, LLC
 

What's hot (20)

PPTX
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
PPTX
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Kent Graziano
 
PDF
Intro to Delta Lake
Databricks
 
PDF
Data Mesh
Piethein Strengholt
 
PPTX
Building a modern data warehouse
James Serra
 
PPTX
Inside open metadata—the deep dive
DataWorks Summit
 
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
PDF
Data Mesh 101
ChrisFord803185
 
PDF
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
PDF
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
PDF
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
PDF
Data Vault Introduction
Patrick Van Renterghem
 
PPTX
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
PDF
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
PDF
Time to Talk about Data Mesh
LibbySchulze
 
PPTX
Designing modern dw and data lake
punedevscom
 
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
PDF
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
PDF
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Introduction To Data Vault - DAMA Oregon 2012
Empowered Holdings, LLC
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Kent Graziano
 
Intro to Delta Lake
Databricks
 
Building a modern data warehouse
James Serra
 
Inside open metadata—the deep dive
DataWorks Summit
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Data Mesh 101
ChrisFord803185
 
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
HostedbyConfluent
 
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Data Vault Introduction
Patrick Van Renterghem
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Kent Graziano
 
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Time to Talk about Data Mesh
LibbySchulze
 
Designing modern dw and data lake
punedevscom
 
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Ad

Viewers also liked (20)

PPTX
Data vault what's Next: Part 2
Empowered Holdings, LLC
 
PPTX
IRM UK - 2009: DV Modeling And Methodology
Empowered Holdings, LLC
 
PPTX
Data vault: What's Next
Empowered Holdings, LLC
 
PPT
Présentation data vault et bi v20120508
Empowered Holdings, LLC
 
PPT
Best Practices: Data Admin & Data Management
Empowered Holdings, LLC
 
PPTX
Visual Data Vault
Michael Olschimke
 
PPT
Oracle Database Vault
Khalid ALLILI
 
PPTX
Data vault seminar May 5-6 Dommel - The factory and the workshop
johannesvdb
 
DOCX
Atul Randive CV_IKnowSolutions_ENv2
atul randive
 
PPTX
Data Vault ReConnect Speed Presenting PM Part Four
Hans Hultgren
 
PDF
Lean Data Warehouse via Data Vault
Daniel Upton
 
PDF
Data Warehouse Agility Array Conference2011
Hans Hultgren
 
PDF
Data Vault ReConnect Speed Presenting AM Part One
Hans Hultgren
 
PDF
Data Vault ReConnect Speed Presenting AM Part Two
Hans Hultgren
 
PDF
Data Vault ReConnect Speed Presenting PM Part Three
Hans Hultgren
 
PDF
Guru4Pro Data Vault Best Practices
CGI
 
PDF
Metadaten und Data Vault (Meta Vault)
Andreas Buckenhofer
 
PDF
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
Andreas Buckenhofer
 
DOCX
Data Vault: Data Warehouse Design Goes Agile
Daniel Upton
 
PDF
Big Data Modeling
Hans Hultgren
 
Data vault what's Next: Part 2
Empowered Holdings, LLC
 
IRM UK - 2009: DV Modeling And Methodology
Empowered Holdings, LLC
 
Data vault: What's Next
Empowered Holdings, LLC
 
Présentation data vault et bi v20120508
Empowered Holdings, LLC
 
Best Practices: Data Admin & Data Management
Empowered Holdings, LLC
 
Visual Data Vault
Michael Olschimke
 
Oracle Database Vault
Khalid ALLILI
 
Data vault seminar May 5-6 Dommel - The factory and the workshop
johannesvdb
 
Atul Randive CV_IKnowSolutions_ENv2
atul randive
 
Data Vault ReConnect Speed Presenting PM Part Four
Hans Hultgren
 
Lean Data Warehouse via Data Vault
Daniel Upton
 
Data Warehouse Agility Array Conference2011
Hans Hultgren
 
Data Vault ReConnect Speed Presenting AM Part One
Hans Hultgren
 
Data Vault ReConnect Speed Presenting AM Part Two
Hans Hultgren
 
Data Vault ReConnect Speed Presenting PM Part Three
Hans Hultgren
 
Guru4Pro Data Vault Best Practices
CGI
 
Metadaten und Data Vault (Meta Vault)
Andreas Buckenhofer
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
Andreas Buckenhofer
 
Data Vault: Data Warehouse Design Goes Agile
Daniel Upton
 
Big Data Modeling
Hans Hultgren
 
Ad

Similar to Data Vault and DW2.0 (20)

PPTX
Data vault
Hennie De Nooijer
 
PDF
Data Virtualization: From Zero to Hero
Denodo
 
PDF
Data Virtualization: An Introduction
Denodo
 
PDF
Logical Data Warehouse and Data Lakes
Denodo
 
PDF
Introduction to data vault ilja dmitrijev
Ilja Dmitrijevs
 
PDF
Introduction to Modern Data Virtualization 2021 (APAC)
Denodo
 
PDF
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Denodo
 
PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo
 
PDF
Data API as a Foundation for Systems of Engagement
Victor Olex
 
DOCX
Data warehouse 2.0 and sql server architecture and vision
Klaudiia Jacome
 
PDF
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Denodo
 
PDF
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
PDF
Data Virtualization: An Introduction
Denodo
 
PDF
The technology of the business data lake
Capgemini
 
PDF
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
PDF
Why Data Virtualization? An Introduction
Denodo
 
PDF
Data Vault 2.0 Demystified: East Coast Tour
WhereScape
 
PPTX
Sql server briefing sept
Mark Kromer
 
PDF
An Overview of Data Lake
IRJET Journal
 
PPTX
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
Data vault
Hennie De Nooijer
 
Data Virtualization: From Zero to Hero
Denodo
 
Data Virtualization: An Introduction
Denodo
 
Logical Data Warehouse and Data Lakes
Denodo
 
Introduction to data vault ilja dmitrijev
Ilja Dmitrijevs
 
Introduction to Modern Data Virtualization 2021 (APAC)
Denodo
 
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Denodo
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo
 
Data API as a Foundation for Systems of Engagement
Victor Olex
 
Data warehouse 2.0 and sql server architecture and vision
Klaudiia Jacome
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Denodo
 
Traditional BI vs. Business Data Lake – A Comparison
Capgemini
 
Data Virtualization: An Introduction
Denodo
 
The technology of the business data lake
Capgemini
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
Why Data Virtualization? An Introduction
Denodo
 
Data Vault 2.0 Demystified: East Coast Tour
WhereScape
 
Sql server briefing sept
Mark Kromer
 
An Overview of Data Lake
IRJET Journal
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 

Recently uploaded (20)

PDF
Keppel Ltd. 1H 2025 Results Presentation Slides
KeppelCorporation
 
PDF
Unveiling the Latest Threat Intelligence Practical Strategies for Strengtheni...
Auxis Consulting & Outsourcing
 
PPTX
PUBLIC RELATIONS N6 slides (4).pptx poin
chernae08
 
PDF
GenAI for Risk Management: Refresher for the Boards and Executives
Alexei Sidorenko, CRMP
 
PPTX
Certificate of Incorporation, Prospectus, Certificate of Commencement of Busi...
Keerthana Chinnathambi
 
PDF
MBA-I-Year-Session-2024-20hzuxutiytidydy
cminati49
 
PDF
askOdin - An Introduction to AI-Powered Investment Judgment
YekSoon LOK
 
PPTX
Appreciations - July 25.pptxdddddddddddss
anushavnayak
 
PPTX
Business Plan Presentation: Vision, Strategy, Services, Growth Goals & Future...
neelsoni2108
 
PPTX
E-commerce and its impact on business.
pandeyranjan5483
 
PDF
12 Oil and Gas Companies in India Driving the Energy Sector.pdf
Essar Group
 
PDF
Gregory Felber - An Accomplished Underwater Marine Biologist
Gregory Felber
 
PPTX
Appreciations - July 25.pptxffsdjjjjjjjjjjjj
anushavnayak
 
PDF
Danielle Oliveira New Jersey - A Seasoned Lieutenant
Danielle Oliveira New Jersey
 
PDF
High Capacity Core IC Pneumatic Spec-Sheet
Forklift Trucks in Minnesota
 
PPTX
Chapter 3 Distributive Negotiation: Claiming Value
badranomar1990
 
PPTX
Social Media Marketing for Business Growth
vidhi622006
 
PPTX
Brain Based Enterprises - Harmonising Man, Woman and Machine
Peter Cook
 
PDF
Equinox Gold - Corporate Presentation.pdf
Equinox Gold Corp.
 
PDF
Retinal Disorder Treatment Market 2030: The Impact of Advanced Diagnostics an...
Kumar Satyam
 
Keppel Ltd. 1H 2025 Results Presentation Slides
KeppelCorporation
 
Unveiling the Latest Threat Intelligence Practical Strategies for Strengtheni...
Auxis Consulting & Outsourcing
 
PUBLIC RELATIONS N6 slides (4).pptx poin
chernae08
 
GenAI for Risk Management: Refresher for the Boards and Executives
Alexei Sidorenko, CRMP
 
Certificate of Incorporation, Prospectus, Certificate of Commencement of Busi...
Keerthana Chinnathambi
 
MBA-I-Year-Session-2024-20hzuxutiytidydy
cminati49
 
askOdin - An Introduction to AI-Powered Investment Judgment
YekSoon LOK
 
Appreciations - July 25.pptxdddddddddddss
anushavnayak
 
Business Plan Presentation: Vision, Strategy, Services, Growth Goals & Future...
neelsoni2108
 
E-commerce and its impact on business.
pandeyranjan5483
 
12 Oil and Gas Companies in India Driving the Energy Sector.pdf
Essar Group
 
Gregory Felber - An Accomplished Underwater Marine Biologist
Gregory Felber
 
Appreciations - July 25.pptxffsdjjjjjjjjjjjj
anushavnayak
 
Danielle Oliveira New Jersey - A Seasoned Lieutenant
Danielle Oliveira New Jersey
 
High Capacity Core IC Pneumatic Spec-Sheet
Forklift Trucks in Minnesota
 
Chapter 3 Distributive Negotiation: Claiming Value
badranomar1990
 
Social Media Marketing for Business Growth
vidhi622006
 
Brain Based Enterprises - Harmonising Man, Woman and Machine
Peter Cook
 
Equinox Gold - Corporate Presentation.pdf
Equinox Gold Corp.
 
Retinal Disorder Treatment Market 2030: The Impact of Advanced Diagnostics an...
Kumar Satyam
 

Data Vault and DW2.0

  • 1. The Application of Data Vault to DW2.0© Dan Linstedt, 2011-2012 all rights reserved
  • 2. A bit about me…2Author, Inventor, Speaker – and part time photographer…25+ years in the IT industryWorked in DoD, US Gov’t, Fortune 50, and so on…Find out more about the Data Vault:http://www.youtube.com/LearnDataVaulthttp://LearnDataVault.comFull profile on http://www.LinkedIn.com/dlinstedt
  • 3. AgendaDefining The Needs for the Data VaultDW2.0 ArchitectureDW2.0 Drivers for Data ModelingDivergence of Data Models over TimeData Vault in DW2.0Defining the Data VaultWhat does one look like?Modeling in DW2.0Applying Data Vault to Global DW2.0Applying Data Vault to Time-Value DW2.0Compliance in DW2.0Applying Data Vault to System of RecordThe Paradox of DW2.0Volume, Latency, Complexity,Normalization andTransformation ability10/5/2011Do Not Duplicate Without Written Permission3
  • 4. DW2.0 Architecture10/5/2011Do Not Duplicate Without Written Permission4Enterprise Service BusESB Connectivity:EAI
  • 5. EII
  • 7. Web ServicesCube ProcessingTemporalIndexingSemanticManagementActive Data MiningTransformationActiveCleansingUnstructured Data:Email
  • 10. ImagesMETADATAInteractiveTacticalData Models Must be consistently applied throughout all layers.IntegratedStrategicESB Management:Text
  • 11. Email
  • 15. DW2.0 Drivers for Data Modeling10/5/2011Do Not Duplicate Without Written Permission5Technical DriversBusiness DriversFlexibilityComplianceVolumeFrequencyDataModelDataModelUnderstandabilityGranularityData Models are one of the main integration points between Technical and Business drivers.Business Keys drive understandability, and granularityNormalization drives flexibility, and frequency of loadRaw data sets in the EDW/ADW drive compliance and volume
  • 16. Divergence of Data Models over TimeData models (both logical and physical) have diverged from business drivers and direction over time.The Data Models have driven towards physical improvements instead of towards business improvements.The Data Vault Architecture drives data modeling back to the business sides of the house.10/5/2011Do Not Duplicate Without Written Permission6
  • 17. AgendaDefining The Needs for the Data VaultDW2.0 ArchitectureDW2.0 Drivers for Data ModelingDivergence of Data Models over TimeData Vault in DW2.0Defining the Data VaultWhat does one look like?Modeling in DW2.0Applying Data Vault to Global DW2.0Applying Data Vault to Time-Value DW2.0Compliance in DW2.0Applying Data Vault to System of RecordThe Paradox of DW2.0Volume, Latency, Complexity,Normalization andTransformation ability10/5/2011Do Not Duplicate Without Written Permission7Image is from - What The Bleep Do We Know?
  • 18. Defining the Data Vault10/5/2011Do Not Duplicate Without Written Permission8The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.Defining the Data VaultTDAN.com Article
  • 19. What Does One Look Like?10/5/2011Do Not Duplicate Without Written Permission9Records a history of the interactionAccount InformationSatSatSatLinkAccountF(x)F(x)SatSatInvoiceIDSatF(x)SatInvoice / Billing InformationCustomer InformationSatElements:Hub
  • 20. Link
  • 21. SatelliteSatCustomerF(x)SatThe impact of linking disparate systems together, is inside the shaded area.
  • 22. Modeling in DW2.0Bill Says:DW2.0 must be brought down to a very finite level of detail.The starting point for DW2.0 is the modeling process.The data model applies to the integrated sector, the near line sector, and the archival sector.The way that data warehouses are built is in an incremental mannerThe Data Vault specializes in:Providing finite grain at the lowest level possible,Mapping business process models to data modelsExisting in all sectors simultaneously without changes.Flexibility and managing change so that impacts are not a mile-wide and 10 miles deep.10/5/2011Do Not Duplicate Without Written Permission10
  • 23. Elements in a Data VaultHubUnique List of Business Keys, tracked by the first time the warehouse saw them appear.LinkRelationships between business keys, also representing a grain shift, or a hierarchical roll-up.SatelliteData over time, granular, and descriptive about the business key. Also setup according to type of information, and rate of change.10/5/2011Do Not Duplicate Without Written Permission11
  • 24. Applying the Data Vault to Global DW2.010/5/2011Do Not Duplicate Without Written Permission12Manufacturing EDW in ChinaPlanning in BrazilHubHubLinkSatSatLinkSatSatLinkHubLinkHubHubSatSatSatSatSatSatSatSatBase EDW Created in CorporateFinancials in USA
  • 25. Applying the Data Vault to Time-Value DW2.010/5/2011Do Not Duplicate Without Written Permission13Satellite Data Over TimeRow 1Row 2Row 3Row 4Satellite entities in the Data Vault house data over time. They are split by type of information and rate of change. This is an example set of data for a customer name satellite.
  • 26. Batch and Real-Time Data Arrival10/5/2011Do Not Duplicate Without Written Permission14All InsertsAll the timeTransaction IDDate StampCustomerAccount #AmountSatTransactionTypeHub CustomerLinkTransactionHub AcctSatCustomerSatAcct3, 6 or 12 Hr Load WindowBatch LoadCustomer InfoAcct Data
  • 27. Star Schema Real-Time Data Issues10/5/2011Do Not Duplicate Without Written Permission15Updates areREQUIRED!Transaction IDDate StampCustomerAccount #AmountType3, 6 or 12 Hr Load WindowDimensionCustomerFactTransactionDimensionAccountBatch LoadCustomer InfoAcct DataCleansing & Quality must occur before the data can reach the target tables, cleansing and quality introduce unwanted latency!
  • 28. Compliance in DW2.010/5/2011Do Not Duplicate Without Written Permission16Changes to Source InformationSource SystemsEDW / ADWData VaultData MartsData DeliveryRaw Detail = auditableLoads in Real-Time or in BatchIntegrated by Business KeyFlexible, allows business changes (with little to no impact)No delay in loading dataData type conformitySemantic IntegrationTrueMartsRawIntegrationBusinessRulesUser orAuditorContinuous Data ImprovementErrorMartQualityDirection of Information FlowMaster Data(Operational)
  • 29. Applying the Data Vault to System Of Record10/5/2011Do Not Duplicate Without Written Permission17Master Data orConformed DimensionsNormalized EDWSource SystemsSORDefinition 2SORDefinition 3SORDefinition 1SOR 1 Data Capture, Data Produced by system algorithmsSOR 2Raw Detailed Integrated Data over time, Integrated by Horizontal (functional) Business Key. Auditable.SOR 3Current view of the business, merged, quality cleansed, single copy, single source, feeds operational systems.
  • 30. DW2.0 ParadoxesDW2.0 incorporates:Unstructured, Semi-Structured, Real-Time, and Batch DataGlobal viewsAll of which drive volumes of data.Volume causes latency in transformation.Volume is directly proportional to transformation complexity.Real-Time data arrival is inversely proportional to complexity and volume.Time for “quality, cleansing, and transformation” on the way in to the EDW diminishes as near-real-time is approached, or massive volumes of batch data are found within a shrinking batch window.Transformation can destroy data audit ability and compliance of the EDW / ADW.10/5/2011Do Not Duplicate Without Written Permission18
  • 31. DW2.0 Paradoxes - Imagery10/5/2011Do Not Duplicate Without Written Permission19DrivesDW2.0Real-TimeTransactionsUnstructuredDataLow-LevelGrainPushesIncreasesLowLatencyVolumeFightsRequiresMerging, Quality,CleansingFightsData ModelDenormalizationFightsData ModelNormalization& Raw DetailsInhibitsRequiresInhibitsAuditability & ComplianceProvides
  • 32. DW2.0 Paradox HypothesisAs we reach near-real time, the ability to transform data and “wait” for parent dependencies directly decreases, the data decay rates increase, and therefore can cause data death if not processed in time.Normalization of the data model increases flexibility, and scalability.The closer we get to near-real-time, the more normalized the data model in the EDW/ADW must become.In order to process high volumes of batch data extremely fast, the “business transformations” must be removed from the load stream of the EDW.10/5/2011Do Not Duplicate Without Written Permission20
  • 33. Data Vault Volumetrics10/5/2011Do Not Duplicate Without Written Permission21Volumetrics (10% null Data)Upon Initial Investigation, the 12 month growth rate for new customers is 197.4 MB per year…. Now let’s factor in the DELTA’s.
  • 34. Data Vault Growth10/5/2011Do Not Duplicate Without Written Permission22Volumetrics (10% null Data) – Delta Growth OnlyOriginal Dimension: 497.16 MB per YearNew Data Vault:317.03 MB Per Year
  • 35. Data Vault VS Dimension Growth10/5/2011Do Not Duplicate Without Written Permission23How does the extensive growth rate affect queries?
  • 36. SummarizationBusiness:Lack of a single view of a customer, product, service, etc...Lack of visibility into ALL information across the enterprise.Competition does it better, faster, cheaper.Unable to identify and forecast business trends and their impacts.WHERE’S THE KNOWLEDGE? OR IS IT JUST ALL DATA?10/5/2011Do Not Duplicate Without Written Permission24Technical:Near-Real-Time (Active)Huge Data VolumesMassive Data Dis-IntegrationSpread-MartsConvergence of Operational and Strategic QuestionsDuplication of data in the ODS, Warehouse, and Data Marts!Dimension-itis!!ODS Ulcer!Fact Table GranularityJUNK tables, Helper Tables
  • 37. Where To Learn MoreThe Technical Modeling Book: http://LearnDataVault.comThe Discussion Forums: & eventshttp://LinkedIn.com – Data Vault DiscussionsContact me:http://DanLinstedt.com - web [email protected] - emailWorld wide User Group (Free)http://dvusergroup.com25