SlideShare a Scribd company logo
DATA VIRTUALIZATION PACKED LUNCH
WEBINAR SERIES
Sessions Covering Key Data Integration Challenges
Solved with Data Virtualization
Data Virtualization: An Introduction
Michael Dickson
Sales Engineer, Denodo
Paul Moxon
VP Data Architectures & Chief Evangelist, Denodo
Agenda
1. Data Virtualization: An Introduction
2. Data Virtualization Platforms – Key Capabilities
3. Product Demo
4. Key Takeaways
5. Q&A
6. Next Steps
Data Virtualization: An Introduction
4
Data Integration – “The Way We Were…”
5
Operational
Data Stores
Staging Area Data Warehouse Data Marts Analytics and
Reporting
ETLETLETL
Data Integration – A Modern Data Ecosystem
6
The Data Integration Challenge
7
Manually access different
systems
IT responds with point-to-
point data integration
Takes too long to get
answers to business users
MarketingSales ExecutiveSupport
Database
Apps
Warehouse Cloud
Big Data
Documents AppsNo SQL
“Data bottlenecks create business bottlenecks.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
8
The Data Integration Challenge
It is difficult to integrate numerous
on-premises and cloud data sources.
Traditional tools cannot integrate streaming
data and data-at-rest in real time.
It is difficult to maintain consistent data
access and governance policies across data
siloes.
Traditional data integration is extremely
resource intensive.
The Solution – A Data Abstraction Layer
9
Abstracts access to
disparate data sources
Acts as a single repository
(virtual)
Makes data available in
real-time to consumers
DATA ABSTRACTION LAYER
“Enterprise architects must revise their data
architecture to meet the demand for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
Data Virtualization
10
“Data virtualization integrates disparate data sources in real time or near-real time
to meet demands for analytics and transactional data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
Publishes
the data to applications
Combines
related data into views
Connects
to disparate data sources
2
3
1
Data Virtualization Reference Architecture
11
Source: “Gartner Market Guide for data virtualization – 2016”
Data virtualization technology can be used to create
virtualized and integrated views of data in memory (rather
than executing data movement and physically storing
integrated views in a target data structure), and provides a
layer of abstraction above the physical implementation of
data.
What Data Virtualization is Not!
• It is not ETL
• If you want to replicate data from ‘A’ to ‘B’…use an ETL tool – it’s what they are designed for
• It is not Data Visualization ( Note the ‘s’)
• It complements visualization and reporting tools (e.g. Tableau)
• It is not a database
• Data Virtualization Platforms don’t store the data…it’s retrieved from the data sources on
demand
• It has many capabilities such as governance, metadata management, security, etc.
• It will work with specialized tools in these areas
• It’s great for service-based architectures
• But be wary of event-driven architectures…use an ESB (or similar) for this
13
− Gartner, Predicts 2017: Data Distribution and Complexity Drive Information Infrastructure
Modernization, Ted Friedman et al.
By 2018, organizations with data virtualization capabilities
will spend 40% less on building and managing data
integration processes for connecting distributed data assets.
14
Data Virtualization Platforms –
Key Capabilities
15
16
Five Essential Capabilities of Data Virtualization
4. Self-service data services
5. Centralized metadata,
security & governance
1. Data abstraction
2. Zero replication, zero relocation
3. Real-time information
17
1. Data abstraction
Abstracts access to disparate data
sources.
Acts as a single virtual repository.
Abstracts data complexities like
location, format, protocols
…hides data complexity for ease of data access by business
Enterprise architects must revise their data
architecture to meet the demand for fast
data.”
– Create a Road Map For A Real-time, Agile, Self-
Service Data Platform, Forrester Research
18
2. Zero replication, zero relocation
…reduces development time and overall TCO
The Denodo Platform enables us to build and
deliver data services, to our internal and external
consumers, within a day instead of the 1 – 2
weeks it would take with ETL.”
– Manager, DrillingInfo
Leaves the data at its source; extracts
only what is needed, on demand.
Diminishes the need for effort-intensive
ETL processes.
Eliminates unnecessary data
redundancy.
19
3. Real-time information
Provisions data in real-time to consumers
Creates real-time logical views of data
across many data sources.
Supports transformations and quality
functions without the latency,
redundancy, and rigidity of legacy
approaches
…enables timely decision-making
Data virtualization integrates disparate data sources in real
time or near-real time to meet demands for analytics and
transactional data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
20
4. Self-service data services
Facilitates access to all data, both internal and
external
Enables creation of universal semantic models
reflecting business taxonomy
Connects data silos to provide best available
information to drive business decisions
…enables information discovery and self-service
Impressively quick turn around time to "unlock“ data from
additional siloes and from legacy systems - Few vendors (if
any) can compete with Denodo's support of the Restful
/OData standard - both to provide data (northbound) and
to access data from the sources (southbound).”
– Business Analyst, Swiss Re
21
5. Centralized metadata, security & governance
Abstracts data source security models and enables
single-point security and governance.
Extends single-point control across cloud and on-
premises architectures
Provides multiple forms of metadata (technical,
business, operational) to facilitate understanding of
data.
…simplifies data security, privacy, audit
Our Denodo rollout was one of the easiest and most successful
rollouts of critical enterprise software I have seen. It was
successful in handling our initial, security, use case
immediately, and has since shown a strong ability to cover
additional use cases, in particular acting as a Data Abstraction
Layer via it's web service functionality.”
– Enterprise Architect, Asurion
22
Denodo ‘Solution’ Categories
Customer Centricity / MDM
✓ Complete View of Customer
Data Services
✓ Data as a Service
✓ Data Marketplace
✓ Data Services
✓ Application and Data Migration
Cloud Solutions
✓ Cloud Modernization
✓ Cloud Analytics
✓ Hybrid Data Fabric
Data Governance
✓ GRC
✓ GDPR
✓ Data Privacy / Masking
BI and Analytics
✓ Self-Service Analytics
✓ Logical Data Warehouse
✓ Enterprise Data Fabric
Big Data
✓ Logical Data Lake
✓ Data Warehouse Offloading
✓ IoT Analytics
Product Demonstration
Data Virtualization – An Introduction
23
Sales Engineer, Denodo
Michael Dickson
24
Demo Architecture
What’s the impact of a new
marketing campaign for each
country?
▪ Historical sales data offloaded to
Hadoop cluster for cheaper storage
▪ Marketing campaigns managed in an
external cloud app
▪ Country is part of the customer
details table, stored in the DW
Sources
Combine,
Transform
&
Integrate
Consume
Base View
Source
Abstraction
join
group by state
join
Sales Campaign Customer
Demo
25
26
What is the optimizer doing?
SELECT c.state, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.state
Sales Customer
join
group by
Sales Customer
join
group by ID
Group by
state
Sales Customer
Create temp
table
join
group by
Temp_Customer
Partial Aggregation PushdownNaïve Strategy Temporary Data Movement
300 M 2 M
2 M
2 M
2 M
50
SELECT c.id, amount
FROM
(SELECT s.customer_id,
SUM(amount) amount
FROM sales s
GROUP BY s.customer_id) s_agg
JOIN Customer c
ON (c.id = s_agg.customer_id)
27
Why is this so important?
SELECT c.name, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.state
How Denodo works compared with other federation engines
System Execution Time Data Transferred Optimization Technique
Denodo 9 sec. 4 M Aggregation push-down
Others 125 sec. 302 M None: full scan
300 M 2 M
Sales Customer
join
group by
2 M
2 M
Sales Customer
join
group by ID
Group by
state
To maximize push
down to the EDW
the aggregation is
split in 2 steps:
• 1st by customerID
• 2nd by state
This significantly
reduces network
Traffic and processing
In Denodo
28
Massive Parallel Processing: Example
2M rows
(sales by customer)
Customer
(2M rows)
Sales
(300 million rows)
group by
customer ID
SELECT c.name, AVG(s.amount)
FROM customer c JOIN sales s
ON c.id = s.customer_id
GROUP BY c.name
join
Group by
name
Similar to the previous query, but now
aggregating by customer name.
What changes?
Partial Aggregation
push down
Maximizes source processing
Reduces network traffic
Swapping to Disk
The aggregation by customer name
produces a larger result set (2M)
that exceeds the memory quota.
Denodo will swap to disk to perform
the intermediate calculation
Serial Calculation
Denodo will perform the calculation
of the aggregation in serial, one row
after another.
With a larger volume, this now becomes
the execution bottleneck
Before MPP Integration
29
join
Group by ZIP
join
Group by ZIP
Massive Parallel Processing: Example
2M rows
(sales by customer)
Customer
(2M rows)
System Execution Time Optimization Techniques
Others ~ 10 min Basic
No MPP 43 sec Aggregation push-down
With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes)
Sales
(300 million rows)
join
Group by ZIP
1. Partial Aggregation
push down
Maximizes source processing
Reduces network traffic
3. On-demand data transfer
For SQL-on-Hadoop systems,
Denodo automatically generates
and upload Parquet files
4. Integration with local
and pre-cached data
The engine detects when data
Is cached or a is native table
in the MPP
2. Integrated with Cost Based Optimizer
Based on data volume estimation and
the cost of these particular operations,
the CBO can decide to move all or part
Of the execution tree to the MPP
5. Fast parallel execution
Support for Spark, Presto and Impala
For fast analytical processing in
inexpensive Hadoop-based solutions
With MPP Integration
group by
customer ID
Key Takeaways
30
Key Takeaways
31
FIRST
Takeaway
Data Virtualization is a key technology when building a modern
data architecture
SECOND
Takeaway
It provides flexibility and agility and reduces the time to deliver
data to the business by up to 10X
THIRD
Takeaway
Data Virtualization hides the complexity of a constantly changing
data infrastructure from the users
FOURTH
Takeaway
In doing so, it allows you to introduce new technologies, formats,
protocols, etc. without causing user disruption
FIFTH
Takeaway
Beware! Not all Data Virtualization platforms are equal…compare
them against the ‘5 criteria’
Q&A
Next steps
Download Denodo Express:
www.denodoexpress.com
Access Denodo Platform in the Cloud!
30 day FREE trial available!
Denodo for Azure:
www.denodo.com/TrialAzure/PackedLunch
Denodo for AWS: www.denodo.com/TrialAWS/PackedLunch
Next session
From Single Purpose to Multi Purpose
Data Lakes - Broadening End Users
Thursday, August 16, 2018
Paul Moxon
VP Data Architectures & Chief Evangelist, Denodo
Thank you!
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.

More Related Content

What's hot (20)

PPTX
Data Lake Overview
James Serra
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
PDF
Data Mesh for Dinner
Kent Graziano
 
PDF
Time to Talk about Data Mesh
LibbySchulze
 
PDF
Data Virtualization: An Introduction
Denodo
 
PDF
Data Catalog for Better Data Discovery and Governance
Denodo
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Data Catalog as a Business Enabler
Srinivasan Sankar
 
PDF
Snowflake Data Science and AI/ML at Scale
Adam Doyle
 
PDF
Modernizing to a Cloud Data Architecture
Databricks
 
PPTX
Demystifying data engineering
Thang Bui (Bob)
 
PPTX
Introduction to Data Engineering
Hadi Fadlallah
 
PDF
Logical Data Fabric: Architectural Components
Denodo
 
PDF
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
PDF
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
PPTX
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
PPTX
Azure Data Factory Data Flow
Mark Kromer
 
PDF
Modern Data architecture Design
Kujambu Murugesan
 
PPTX
Introduction to Azure Databricks
James Serra
 
Data Lake Overview
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Data Mesh for Dinner
Kent Graziano
 
Time to Talk about Data Mesh
LibbySchulze
 
Data Virtualization: An Introduction
Denodo
 
Data Catalog for Better Data Discovery and Governance
Denodo
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Data Catalog as a Business Enabler
Srinivasan Sankar
 
Snowflake Data Science and AI/ML at Scale
Adam Doyle
 
Modernizing to a Cloud Data Architecture
Databricks
 
Demystifying data engineering
Thang Bui (Bob)
 
Introduction to Data Engineering
Hadi Fadlallah
 
Logical Data Fabric: Architectural Components
Denodo
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Azure Data Factory Data Flow
Mark Kromer
 
Modern Data architecture Design
Kujambu Murugesan
 
Introduction to Azure Databricks
James Serra
 

Similar to Why Data Virtualization? An Introduction (20)

PDF
Data virtualization an introduction
Denodo
 
PDF
Data Virtualization: An Introduction
Denodo
 
PDF
An Introduction to Data Virtualization in 2018
Denodo
 
PDF
Introduction to Modern Data Virtualization 2021 (APAC)
Denodo
 
PDF
Introduction to Modern Data Virtualization (US)
Denodo
 
PDF
Data Virtualization. An Introduction (ASEAN)
Denodo
 
PDF
Data Virtualization for Data Architects (New Zealand)
Denodo
 
PPTX
Take your Data Management Practice to the Next Level with Denodo 7
Denodo
 
PDF
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Denodo
 
PDF
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
Denodo
 
PDF
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Denodo
 
PDF
Data Virtualization: From Zero to Hero
Denodo
 
PDF
MasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
Denodo
 
PDF
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo
 
PDF
Data Virtualization for Data Architects (Australia)
Denodo
 
PDF
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
PDF
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
Denodo
 
PDF
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
Denodo
 
PDF
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
Denodo
 
Data virtualization an introduction
Denodo
 
Data Virtualization: An Introduction
Denodo
 
An Introduction to Data Virtualization in 2018
Denodo
 
Introduction to Modern Data Virtualization 2021 (APAC)
Denodo
 
Introduction to Modern Data Virtualization (US)
Denodo
 
Data Virtualization. An Introduction (ASEAN)
Denodo
 
Data Virtualization for Data Architects (New Zealand)
Denodo
 
Take your Data Management Practice to the Next Level with Denodo 7
Denodo
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Denodo
 
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
Denodo
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Denodo
 
Data Virtualization: From Zero to Hero
Denodo
 
MasterClass Series: Unlocking Data Sharing Velocity with Data Virtualization
Denodo
 
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...
Denodo
 
Data Virtualization for Data Architects (Australia)
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
Denodo
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
Denodo
 
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
Denodo
 
Ad

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
Denodo
 
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Denodo
 
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
Denodo
 
PDF
What you need to know about Generative AI and Data Management?
Denodo
 
PDF
Mastering Data Compliance in a Dynamic Business Landscape
Denodo
 
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo
 
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Denodo
 
PDF
Drive Data Privacy Regulatory Compliance
Denodo
 
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
Denodo
 
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo
 
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo
 
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
Denodo
 
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Denodo
 
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Denodo
 
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Denodo
 
PDF
How to Build Your Data Marketplace with Data Virtualization?
Denodo
 
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Denodo
 
PDF
Enabling Data Catalog users with advanced usability
Denodo
 
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo
 
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Denodo
 
Enterprise Monitoring and Auditing in Denodo
Denodo
 
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Denodo
 
Achieving Self-Service Analytics with a Governed Data Services Layer
Denodo
 
What you need to know about Generative AI and Data Management?
Denodo
 
Mastering Data Compliance in a Dynamic Business Landscape
Denodo
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo
 
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Denodo
 
Drive Data Privacy Regulatory Compliance
Denodo
 
Знакомство с виртуализацией данных для профессионалов в области данных
Denodo
 
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo
 
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Denodo
 
Lunch and Learn ANZ: Key Takeaways for 2023!
Denodo
 
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Denodo
 
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Denodo
 
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
Denodo
 
How to Build Your Data Marketplace with Data Virtualization?
Denodo
 
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Denodo
 
Enabling Data Catalog users with advanced usability
Denodo
 
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
Denodo
 
GenAI y el futuro de la gestión de datos: mitos y realidades
Denodo
 
Ad

Recently uploaded (20)

PPTX
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
PDF
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PPTX
covid 19 data analysis updates in our municipality
RhuAyungon1
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Research Methodology Overview Introduction
ayeshagul29594
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
covid 19 data analysis updates in our municipality
RhuAyungon1
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 

Why Data Virtualization? An Introduction

  • 1. DATA VIRTUALIZATION PACKED LUNCH WEBINAR SERIES Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
  • 2. Data Virtualization: An Introduction Michael Dickson Sales Engineer, Denodo Paul Moxon VP Data Architectures & Chief Evangelist, Denodo
  • 3. Agenda 1. Data Virtualization: An Introduction 2. Data Virtualization Platforms – Key Capabilities 3. Product Demo 4. Key Takeaways 5. Q&A 6. Next Steps
  • 4. Data Virtualization: An Introduction 4
  • 5. Data Integration – “The Way We Were…” 5 Operational Data Stores Staging Area Data Warehouse Data Marts Analytics and Reporting ETLETLETL
  • 6. Data Integration – A Modern Data Ecosystem 6
  • 7. The Data Integration Challenge 7 Manually access different systems IT responds with point-to- point data integration Takes too long to get answers to business users MarketingSales ExecutiveSupport Database Apps Warehouse Cloud Big Data Documents AppsNo SQL “Data bottlenecks create business bottlenecks.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
  • 8. 8 The Data Integration Challenge It is difficult to integrate numerous on-premises and cloud data sources. Traditional tools cannot integrate streaming data and data-at-rest in real time. It is difficult to maintain consistent data access and governance policies across data siloes. Traditional data integration is extremely resource intensive.
  • 9. The Solution – A Data Abstraction Layer 9 Abstracts access to disparate data sources Acts as a single repository (virtual) Makes data available in real-time to consumers DATA ABSTRACTION LAYER “Enterprise architects must revise their data architecture to meet the demand for fast data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
  • 10. Data Virtualization 10 “Data virtualization integrates disparate data sources in real time or near-real time to meet demands for analytics and transactional data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015 Publishes the data to applications Combines related data into views Connects to disparate data sources 2 3 1
  • 11. Data Virtualization Reference Architecture 11
  • 12. Source: “Gartner Market Guide for data virtualization – 2016” Data virtualization technology can be used to create virtualized and integrated views of data in memory (rather than executing data movement and physically storing integrated views in a target data structure), and provides a layer of abstraction above the physical implementation of data.
  • 13. What Data Virtualization is Not! • It is not ETL • If you want to replicate data from ‘A’ to ‘B’…use an ETL tool – it’s what they are designed for • It is not Data Visualization ( Note the ‘s’) • It complements visualization and reporting tools (e.g. Tableau) • It is not a database • Data Virtualization Platforms don’t store the data…it’s retrieved from the data sources on demand • It has many capabilities such as governance, metadata management, security, etc. • It will work with specialized tools in these areas • It’s great for service-based architectures • But be wary of event-driven architectures…use an ESB (or similar) for this 13
  • 14. − Gartner, Predicts 2017: Data Distribution and Complexity Drive Information Infrastructure Modernization, Ted Friedman et al. By 2018, organizations with data virtualization capabilities will spend 40% less on building and managing data integration processes for connecting distributed data assets. 14
  • 15. Data Virtualization Platforms – Key Capabilities 15
  • 16. 16 Five Essential Capabilities of Data Virtualization 4. Self-service data services 5. Centralized metadata, security & governance 1. Data abstraction 2. Zero replication, zero relocation 3. Real-time information
  • 17. 17 1. Data abstraction Abstracts access to disparate data sources. Acts as a single virtual repository. Abstracts data complexities like location, format, protocols …hides data complexity for ease of data access by business Enterprise architects must revise their data architecture to meet the demand for fast data.” – Create a Road Map For A Real-time, Agile, Self- Service Data Platform, Forrester Research
  • 18. 18 2. Zero replication, zero relocation …reduces development time and overall TCO The Denodo Platform enables us to build and deliver data services, to our internal and external consumers, within a day instead of the 1 – 2 weeks it would take with ETL.” – Manager, DrillingInfo Leaves the data at its source; extracts only what is needed, on demand. Diminishes the need for effort-intensive ETL processes. Eliminates unnecessary data redundancy.
  • 19. 19 3. Real-time information Provisions data in real-time to consumers Creates real-time logical views of data across many data sources. Supports transformations and quality functions without the latency, redundancy, and rigidity of legacy approaches …enables timely decision-making Data virtualization integrates disparate data sources in real time or near-real time to meet demands for analytics and transactional data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
  • 20. 20 4. Self-service data services Facilitates access to all data, both internal and external Enables creation of universal semantic models reflecting business taxonomy Connects data silos to provide best available information to drive business decisions …enables information discovery and self-service Impressively quick turn around time to "unlock“ data from additional siloes and from legacy systems - Few vendors (if any) can compete with Denodo's support of the Restful /OData standard - both to provide data (northbound) and to access data from the sources (southbound).” – Business Analyst, Swiss Re
  • 21. 21 5. Centralized metadata, security & governance Abstracts data source security models and enables single-point security and governance. Extends single-point control across cloud and on- premises architectures Provides multiple forms of metadata (technical, business, operational) to facilitate understanding of data. …simplifies data security, privacy, audit Our Denodo rollout was one of the easiest and most successful rollouts of critical enterprise software I have seen. It was successful in handling our initial, security, use case immediately, and has since shown a strong ability to cover additional use cases, in particular acting as a Data Abstraction Layer via it's web service functionality.” – Enterprise Architect, Asurion
  • 22. 22 Denodo ‘Solution’ Categories Customer Centricity / MDM ✓ Complete View of Customer Data Services ✓ Data as a Service ✓ Data Marketplace ✓ Data Services ✓ Application and Data Migration Cloud Solutions ✓ Cloud Modernization ✓ Cloud Analytics ✓ Hybrid Data Fabric Data Governance ✓ GRC ✓ GDPR ✓ Data Privacy / Masking BI and Analytics ✓ Self-Service Analytics ✓ Logical Data Warehouse ✓ Enterprise Data Fabric Big Data ✓ Logical Data Lake ✓ Data Warehouse Offloading ✓ IoT Analytics
  • 23. Product Demonstration Data Virtualization – An Introduction 23 Sales Engineer, Denodo Michael Dickson
  • 24. 24 Demo Architecture What’s the impact of a new marketing campaign for each country? ▪ Historical sales data offloaded to Hadoop cluster for cheaper storage ▪ Marketing campaigns managed in an external cloud app ▪ Country is part of the customer details table, stored in the DW Sources Combine, Transform & Integrate Consume Base View Source Abstraction join group by state join Sales Campaign Customer
  • 26. 26 What is the optimizer doing? SELECT c.state, AVG(s.amount) FROM customer c JOIN sales s ON c.id = s.customer_id GROUP BY c.state Sales Customer join group by Sales Customer join group by ID Group by state Sales Customer Create temp table join group by Temp_Customer Partial Aggregation PushdownNaïve Strategy Temporary Data Movement 300 M 2 M 2 M 2 M 2 M 50 SELECT c.id, amount FROM (SELECT s.customer_id, SUM(amount) amount FROM sales s GROUP BY s.customer_id) s_agg JOIN Customer c ON (c.id = s_agg.customer_id)
  • 27. 27 Why is this so important? SELECT c.name, AVG(s.amount) FROM customer c JOIN sales s ON c.id = s.customer_id GROUP BY c.state How Denodo works compared with other federation engines System Execution Time Data Transferred Optimization Technique Denodo 9 sec. 4 M Aggregation push-down Others 125 sec. 302 M None: full scan 300 M 2 M Sales Customer join group by 2 M 2 M Sales Customer join group by ID Group by state To maximize push down to the EDW the aggregation is split in 2 steps: • 1st by customerID • 2nd by state This significantly reduces network Traffic and processing In Denodo
  • 28. 28 Massive Parallel Processing: Example 2M rows (sales by customer) Customer (2M rows) Sales (300 million rows) group by customer ID SELECT c.name, AVG(s.amount) FROM customer c JOIN sales s ON c.id = s.customer_id GROUP BY c.name join Group by name Similar to the previous query, but now aggregating by customer name. What changes? Partial Aggregation push down Maximizes source processing Reduces network traffic Swapping to Disk The aggregation by customer name produces a larger result set (2M) that exceeds the memory quota. Denodo will swap to disk to perform the intermediate calculation Serial Calculation Denodo will perform the calculation of the aggregation in serial, one row after another. With a larger volume, this now becomes the execution bottleneck Before MPP Integration
  • 29. 29 join Group by ZIP join Group by ZIP Massive Parallel Processing: Example 2M rows (sales by customer) Customer (2M rows) System Execution Time Optimization Techniques Others ~ 10 min Basic No MPP 43 sec Aggregation push-down With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes) Sales (300 million rows) join Group by ZIP 1. Partial Aggregation push down Maximizes source processing Reduces network traffic 3. On-demand data transfer For SQL-on-Hadoop systems, Denodo automatically generates and upload Parquet files 4. Integration with local and pre-cached data The engine detects when data Is cached or a is native table in the MPP 2. Integrated with Cost Based Optimizer Based on data volume estimation and the cost of these particular operations, the CBO can decide to move all or part Of the execution tree to the MPP 5. Fast parallel execution Support for Spark, Presto and Impala For fast analytical processing in inexpensive Hadoop-based solutions With MPP Integration group by customer ID
  • 31. Key Takeaways 31 FIRST Takeaway Data Virtualization is a key technology when building a modern data architecture SECOND Takeaway It provides flexibility and agility and reduces the time to deliver data to the business by up to 10X THIRD Takeaway Data Virtualization hides the complexity of a constantly changing data infrastructure from the users FOURTH Takeaway In doing so, it allows you to introduce new technologies, formats, protocols, etc. without causing user disruption FIFTH Takeaway Beware! Not all Data Virtualization platforms are equal…compare them against the ‘5 criteria’
  • 32. Q&A
  • 33. Next steps Download Denodo Express: www.denodoexpress.com Access Denodo Platform in the Cloud! 30 day FREE trial available! Denodo for Azure: www.denodo.com/TrialAzure/PackedLunch Denodo for AWS: www.denodo.com/TrialAWS/PackedLunch
  • 34. Next session From Single Purpose to Multi Purpose Data Lakes - Broadening End Users Thursday, August 16, 2018 Paul Moxon VP Data Architectures & Chief Evangelist, Denodo
  • 35. Thank you! © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.