SlideShare a Scribd company logo
Connecting Silos in
Real-time With Data
Virtualization
Becky Smith, Sr. Product Marketing Manager
November, 2018
1
Data Integration – “The Way We Were…”
Operational
Data Stores
Staging
Area
Data
Warehouse
Data
Marts
Analytics and
Reporting
ETLETLETL
2
Data Exploration
Data Integration – A Modern Data Ecosystem
Governance

Platforms
Security, Compliance & Business Continuity
Information
Access
Actionable
Insight
Business
Outcomes
Data Integration
Streaming Computing
Operational and Analytical Repositories
Shared Reference Information
Data Sources and
Data Acquisition
Data Repositories
Sandboxes
New Insight
In-Memory
DB/Grid
“Fit for Purpose” Data Marts
EDW
Event Detection and Action
CRM
Marketing Automation
HR
Finance
ERP
Logistics
…………
Data reservoir
& Refinery
Discover data
Parse & Refine
Transform & Cleanse
ODS
Reports
Dashboards
Discovery
Visualization
Advanced
Analytics
3
The Data Integration Challenge
Manually access different systems
IT responds with point-to-point
data integration
Takes too long to get answers to
business users
MarketingSales ExecutiveSupport
Database
Apps
Warehouse Cloud
Big Data
Documents AppsNo SQL
“Data bottlenecks create business
bottlenecks.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
4
The Solution – A Data Abstraction Layer
Abstracts access to disparate data
sources
Acts as a single repository (virtual)
Makes data available in real-time
to consumers
DATA ABSTRACTION LAYER
“Enterprise architects must revise their
data architecture to meet the demand
for fast data.”
– Create a Road Map For A Real-time, Agile, Self-Service Data
Platform, Forrester Research, Dec 16, 2015
5
Consume
in business
applications
Combine
Right information
at right time
2
3 DATA CONSUMERS
Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users, IoT/Streaming Data
Connect
Any source,
any format
1 DISPARATE DATA SOURCES
Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word...
Less StructuredMore Structured
Multiple protocols,
formats
Linked data services
query, search, browse
Request/Reply,
event driven
Secure
delivery
Library of
wrappers
Web
automation
Any data
or content
Read
& Write
DATA VIRTUALIZATION
DATA CONSUMERSAnalytical Operational
CONNECT COMBINE CONSUME
Share, Deliver,
Publish, Govern,
Collaborate
Discover,
Transform,
Prepare,
Improve Quality,
Integrate
Normalized
views of
disparate data
Agile Development
Performance
Resource Management
Lifecycle Management Data Services
Data Catalog
Governance & Metadata
Security & Data Privacy
Denodo Data Virtualization Architecture
6
Modern Data Architectures are much more complex than the architectures
of just 10 years ago
Replicating (copying) data into a central repository doesn’t work at scale or
complexity needed today
Data Virtualization can provide access to all of your data, in real-time, and supporting
self-service with a common data model (in the context of the business users)
Let’s find out how…
Logical Data Warehouse
“The Logical Data Warehouse (LDW) is a new data management
architecture for analytics combining the strengths of traditional
repository warehouses with alternative data management and
access strategy.”
7
- Gartner Hype Cycle for Enterprise Information Management, 2012
8
Data Warehouse + Cloud Dimensional Data
Time
Dimension
Fact table
(sales) Product
Dimension
Customer
Dimension
CRM
SFDC
Customer
EDW
9
Multiple Data Warehouse Integration
Time
Dimension Sales fact Product
Dimension
Region
Finance EDW
City
Marketing EDW
Customer
Fidelity facts
Product
Dimension
*Real Examples: Nationwide POC, IBM tests
Store
10
Horizontal Partitioning
Data Warehouse Historical Offloading
Time
Dimension
Fact table
(sales)
Product
Dimension
Retailer
Dimension
Current Sales Historical Sales
EDW
11
Providing access to integrated data in real time
Big Data Analytics Framework
Benefits
§ Enhanced insight across the
business without physically moving
data
§ Simplified data consumption with a
single endpoint for all data access
§ Faster integration of new data
sources
§ Smarter decision making via
additional information-enrichment
capabilities
§ Increased speed and agility of both
business and IT, significantly
increasing customer satisfaction
12
Logical Data Warehouse at Autodesk
Benefits
§ For the first time, Autodesk can do
single-point security enforcement
and have uniform data
environment for access.
§ Reduced replication of data with
less use of ETL processes
§ Single point of enforcement for
security
§ Uniform environment for data
access in place
§ Development flexibility to
understand what is needed to
build before actually building
13
Summary
The Logical Data Warehouse (LDW) is an evolution and augmentation of DW
practices, not a replacement
A repository-only style data warehouse contains a single ontology/ taxonomy,
whereas in the LDW a semantic layer can contain many combination of use cases,
many business definitions of the same information
The LDW permits an IT organization to make a large number of datasets available for
analysis via query tools and applications
Query Optimization in the Logical
Data Warehouse
14
15
- Gartner, Magic Quadrant for Data Integration, 2017
The Denodo Platform ... incorporates dynamic query optimization as a key value
point. This capability includes support for cost-based optimization specifically for
high data volume and complexity;... it has also added an in-memory data grid
with Massively Parallel Processing(MPP) architecture to its platform.
16
Obtain Total Sales By Customer Country in the Last Two Years
Query Optimization: Example (1)
Naive Strategy (BI Tools, BDI Tools, Simple federation engines):
join
union
group by
Customers (3M)
Sales previous years (38)
Sales this year (290M)
290M rows 300M rows (sales
previous year)
3M rows
593M rows through the network
System Execution Time Optimization Technique
No Rewriting 20 min None
17
Obtain Total Sales By Customer Country in the Last Two Years
Query Optimization: Example (2)
Denodo Strategy – Aggregation push-down
join
union
group by
Customers (3M)
Sales previous years (3B)
Sales this year (290M)
3M rows (sales by
customer this year)
3M rows (sales by
customer previous
year)
3M rows
9 M rows through the network
group by
customer
group by
customer
System Execution Time Optimization Technique
No Rewriting 20 min None
Denodo 6 51 sec Aggregation push-down
18
Obtain Total Sales By Customer Country in the Last Two Years
Query Optimization: Example (3)
union
group by
3M rows
(sales by customer
this year)
3M rows
(sales by customer
previous year)
3M rows
(customers) Aggregation pushdown
group by
customer
group by
customer
join
Integrated
MPP processing
System Execution Time Optimization Technique
No Rewriting 20 min None
Denodo 6 51 sec Aggregation push-down
Denodo 7 13 sec
Aggregation push-down
+ MPP integration
Customers (3M)
19
You can achieve excellent performance in
Logical Analytic Architectures
Key
techniques
needed:
Advanced Dynamic Optimization to minimize network
traffic and leverage the power of data sources
In-memory MPP processing to speed operations at the
Data Virtualization layer
Advanced incremental caching for reusing commonly
used data and complex calculations
Security and Governance with a
Data Abstraction Layer
21
Different Data Sources – Different Security Models
Databases/EDW – Mature RBAC model
Hadoop – Kerberos
§ Cloudera – Apache Sentry and Knox
§ Hortonworks – Apache Ranger and Atlas
Cloud – OAuth 2.0 (?)
Files – Binary – Read access or none
Web Services – Multiple models
In many cases, the consumer has to deal with these
different security models and technologies
22
Abstracting Data Source Security
Provide single data model to consumers
§ Role-based Access to data on need basis
§ Removes data silo security
Hide complexity of different security models and
maturities
Integrate with existing authentication system
(e.g. LDAP/AD)
Single point for monitoring/auditing
§ Who, what, when, how, …
Ensure compliance with corporate policies
Data access and privacy rules enforced ‘on the fly’
23
Security in a Hybrid Environment
Moving data to Cloud can exacerbate security and privacy problems
SaaS and Cloud data sources often have different security models
Not integrated to corporate authentication mechanisms
Potential for recreating authentication model in Cloud
Data Virtualization abstraction layer means Cloud sources can use same security
mechanism and access controls as on premise sources
24
Customer Use Case - Asurion
International Expansion - moving into
different privacy and data protection
jurisdictions
New products – need for different data
types and sources
§ Mixing structured, multi-structured,
streaming, text, video, voice, geo-
location, etc.
Moving to Cloud for increased speed and
agility
§ Easier to spin up new virtual servers for
new data sets
Competing pressures for securing data
and providing access to data sets
Security Constraints
Geographical
Constraints
Contractual
Client
Obligations
PII Protection
Departmental
Restrictions
Fast Changing Hadoop & Cloud
Technologies
Hive, Spark,
Redshift
Maintaining
different code
base
Discover, Co-relate,
Enable Predictive
Analytics
Text, CSV, Voice,
JSON,
Streaming, 3rd
Party Data
60TB+ structured,
200TB+
telemetry &
unstructured
data
25
Asurion – Hybrid Architecture
After implementing hybrid Data
Virtualization layer, Asurion was able to:
§ Control security across entire
infrastructure from a single point
§ Easily meet regional security and
privacy requirements
§ Keep client data separate as
contractually required – but allow
analytics over all (anonymized) data
§ Perform complete audits of data
access, as needed
§ Quickly add new, compliant data
sources to system
26
Governance…
Governance features are pervasive in Denodo Platform:
§ Users can inspect catalog of virtualization objects through catalog search to find data
combinations for reuse
§ Data lineage helps users to understand where data has come from and how it has changed from
the source
§ Impact analysis helps architects understand the consequences of changes in the data source
schemas
§ Propagate changes selectively with a single click.
27
Data Lineage
Graphical view for showing data lineage for any field in any virtual view.
Trace source of any field:
• Includes any functions applied
to field contents.
Trace source of calculated fields:
• View calculations used to
create new fields.
28
“Used by”
“Big picture” view of usage
Useful for seeing impact of
changes on whole system
29
Single point for security and governance
Extends single point of control across Cloud and on premise architectures
catalog search helps users find data combinations for reuse
Data lineage helps users to understand where data has come from and how it has
changed from the source
Impact analysis helps architects understand the consequences of change
Delivering Self-Service
30
31
Self-Service Challenges…….
Tools are designed for data analysts (or power users)
§ Users who are happy finding, wrangling, cleansing data
§ Creating calculations, aggregations within the data
What about the other business users?
§ People who don’t want to spend hours fighting the spreadsheet…
Will they use common definitions for key business entities and metrics?
§ Or will they pick and choose their own?
Ultimately, can you trust the numbers?
§ Where did the data come from?
§ How has is been manipulated?
31
32
Self-Service with Guardrails
Don’t build just for the ‘data cowboys’
Create a common and consistent semantic layer
§ Everyone is using the same definitions and metrics
Create pre-integrated, pre-calculated data services
§ Save the user having to do this themselves
§ Ensures consistency of calculations, etc.
But allow the cowboys to ‘roam and wrangle’
§ Even the cowboys can only access ‘approved’ data
sources
33
Self-Service Architecture
34
Logical Data Warehouse Improves Information Agility
Benefits
§ Diverse data spread across the entire
enterprise can now be accessed
instantaneously and securely with a
proper authorization structure
§ Core Business Intelligence logic is
becoming centralized, reducing
duplication of effort and enhancing
development efficiency
§ Searchable data dictionary helps
report writers find the data they
need and help improve the self-
service experience
35
“Get it Real-time and Get it Fast!”
The Benefits of Data Virtualization
Complete enterprise information, combining Web, cloud,
streaming, and structured data
ROI realization within 6 months, with the flexibility to
adjust to unforeseen changes
An 80% reduction in integration costs, in terms of
resources and technology
Real-time integration and data access, enabling faster
business decisions
36
Denodo
The Leader in Data Virtualization
DENODO OFFICES, CUSTOMERS, PARTNERS
Palo Alto, CA.
Global presence throughout North America,
EMEA, APAC, and Latin America
LEADERSHIP
▪ Longest continuous focus on data virtualization
– since 1999
▪ Leader in 2018 Forrester Wave – Big Data Fabric
▪ Winner of numerous awards
CUSTOMERS
~500 customers, including many F500 and
G2000 companies across every major industry
have gained significant business agility and ROI
DENODO AUSTRALIA
L13 – Macquarie House, 167 Macquarie Street
NSW Sydney 2000

More Related Content

What's hot (20)

PPTX
Tiger graph 2021 corporate overview [read only]
ercan5
 
PDF
Analytics in banking preview deck - june 2013
Everest Group
 
PDF
Manufacturing & Supply Chain Analytics Use Cases
Rishabh Rai
 
PPTX
Big Data Case study - caixa bank
Chungsik Yun
 
PDF
Financial Markets Data & Analytics Led Transformation
Gianpaolo Zampol
 
PDF
Company Evolution – Evolving Beyond the Traditional Scope Through Data Moneti...
Molly Alexander
 
PDF
Turning Big Data into Better Business Outcomes
Cisco Canada
 
PPTX
BI and DA
Haroon Karim
 
PDF
Computer Vision: Coming to a Store Near You - Brent Biddulph
Molly Alexander
 
PDF
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Molly Alexander
 
PDF
Digital Decisioning for the New Decade - 2020 and Beyond
SCL HUB Conference
 
PDF
Lead to Cash: The Value of Big Data and Analytics for Telco
Sam Thomsett
 
PPTX
IT Solutions for Banking and Financial Services
ScienceSoft
 
PDF
Next generation big data bi
Stanley Wang
 
PPTX
Future and scope of big data analytics in Digital Finance and banking.
VIJAYAKUMAR P
 
PPTX
Business Value of Data
UIResearchPark
 
PPTX
Welcome to the Age of Big Data in Banking
Andy Hirst
 
PDF
Data-driven Banking: Managing the Digital Transformation
LindaWatson19
 
PDF
How Do We Use a Business or Regulatory Event to Improve Your Data Management ...
Molly Alexander
 
PPTX
From Business Intelligence to Big Data - hack/reduce Dec 2014
Adam Ferrari
 
Tiger graph 2021 corporate overview [read only]
ercan5
 
Analytics in banking preview deck - june 2013
Everest Group
 
Manufacturing & Supply Chain Analytics Use Cases
Rishabh Rai
 
Big Data Case study - caixa bank
Chungsik Yun
 
Financial Markets Data & Analytics Led Transformation
Gianpaolo Zampol
 
Company Evolution – Evolving Beyond the Traditional Scope Through Data Moneti...
Molly Alexander
 
Turning Big Data into Better Business Outcomes
Cisco Canada
 
BI and DA
Haroon Karim
 
Computer Vision: Coming to a Store Near You - Brent Biddulph
Molly Alexander
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Molly Alexander
 
Digital Decisioning for the New Decade - 2020 and Beyond
SCL HUB Conference
 
Lead to Cash: The Value of Big Data and Analytics for Telco
Sam Thomsett
 
IT Solutions for Banking and Financial Services
ScienceSoft
 
Next generation big data bi
Stanley Wang
 
Future and scope of big data analytics in Digital Finance and banking.
VIJAYAKUMAR P
 
Business Value of Data
UIResearchPark
 
Welcome to the Age of Big Data in Banking
Andy Hirst
 
Data-driven Banking: Managing the Digital Transformation
LindaWatson19
 
How Do We Use a Business or Regulatory Event to Improve Your Data Management ...
Molly Alexander
 
From Business Intelligence to Big Data - hack/reduce Dec 2014
Adam Ferrari
 

Similar to Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION (20)

PDF
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
Denodo
 
PPTX
Fast Data Strategy Houston Roadshow Presentation
Denodo
 
PDF
Why Data Virtualization? An Introduction
Denodo
 
PDF
Virtualisation de données : Enjeux, Usages & Bénéfices
Denodo
 
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
PDF
3 Reasons Data Virtualization Matters in Your Portfolio
Denodo
 
PDF
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Denodo
 
PDF
Best Practices in the Cloud for Data Management (US)
Denodo
 
PDF
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
PDF
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
PDF
Data Virtualization. An Introduction (ASEAN)
Denodo
 
PDF
Logical Data Warehouse and Data Lakes
Denodo
 
PDF
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
PDF
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Denodo
 
PDF
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
PDF
A Logical Architecture is Always a Flexible Architecture (ASEAN)
Denodo
 
PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo
 
PDF
Data Virtualization: An Introduction
Denodo
 
PDF
Denodo Platform 7.0: What's New?
Denodo
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
Denodo
 
Fast Data Strategy Houston Roadshow Presentation
Denodo
 
Why Data Virtualization? An Introduction
Denodo
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Denodo
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
3 Reasons Data Virtualization Matters in Your Portfolio
Denodo
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Denodo
 
Best Practices in the Cloud for Data Management (US)
Denodo
 
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Data Virtualization. An Introduction (ASEAN)
Denodo
 
Logical Data Warehouse and Data Lakes
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Denodo
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
Denodo
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo
 
Data Virtualization: An Introduction
Denodo
 
Denodo Platform 7.0: What's New?
Denodo
 
Ad

More from Matt Stubbs (20)

PDF
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Matt Stubbs
 
PDF
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
PDF
Blueprint Series: Expedia Partner Solutions, Data Platform
Matt Stubbs
 
PDF
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
PDF
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Matt Stubbs
 
PDF
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Matt Stubbs
 
PDF
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Matt Stubbs
 
PDF
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Matt Stubbs
 
PDF
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
PDF
Big Data LDN 2018: AI VS. GDPR
Matt Stubbs
 
PDF
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Matt Stubbs
 
PDF
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Matt Stubbs
 
PDF
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
PDF
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
PDF
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Matt Stubbs
 
PDF
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Matt Stubbs
 
PDF
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
PDF
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Matt Stubbs
 
PDF
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Matt Stubbs
 
PDF
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Matt Stubbs
 
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Matt Stubbs
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Matt Stubbs
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Matt Stubbs
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Matt Stubbs
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Matt Stubbs
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Matt Stubbs
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
Big Data LDN 2018: AI VS. GDPR
Matt Stubbs
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Matt Stubbs
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Matt Stubbs
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Matt Stubbs
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Matt Stubbs
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Matt Stubbs
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Matt Stubbs
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Matt Stubbs
 
Ad

Recently uploaded (20)

PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 

Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION

  • 1. Connecting Silos in Real-time With Data Virtualization Becky Smith, Sr. Product Marketing Manager November, 2018
  • 2. 1 Data Integration – “The Way We Were…” Operational Data Stores Staging Area Data Warehouse Data Marts Analytics and Reporting ETLETLETL
  • 3. 2 Data Exploration Data Integration – A Modern Data Ecosystem Governance Platforms Security, Compliance & Business Continuity Information Access Actionable Insight Business Outcomes Data Integration Streaming Computing Operational and Analytical Repositories Shared Reference Information Data Sources and Data Acquisition Data Repositories Sandboxes New Insight In-Memory DB/Grid “Fit for Purpose” Data Marts EDW Event Detection and Action CRM Marketing Automation HR Finance ERP Logistics ………… Data reservoir & Refinery Discover data Parse & Refine Transform & Cleanse ODS Reports Dashboards Discovery Visualization Advanced Analytics
  • 4. 3 The Data Integration Challenge Manually access different systems IT responds with point-to-point data integration Takes too long to get answers to business users MarketingSales ExecutiveSupport Database Apps Warehouse Cloud Big Data Documents AppsNo SQL “Data bottlenecks create business bottlenecks.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
  • 5. 4 The Solution – A Data Abstraction Layer Abstracts access to disparate data sources Acts as a single repository (virtual) Makes data available in real-time to consumers DATA ABSTRACTION LAYER “Enterprise architects must revise their data architecture to meet the demand for fast data.” – Create a Road Map For A Real-time, Agile, Self-Service Data Platform, Forrester Research, Dec 16, 2015
  • 6. 5 Consume in business applications Combine Right information at right time 2 3 DATA CONSUMERS Enterprise Applications, Reporting, BI, Portals, ESB, Mobile, Web, Users, IoT/Streaming Data Connect Any source, any format 1 DISPARATE DATA SOURCES Databases & Warehouses, Cloud/Saas Applications, Big Data, NoSQL, Web, XML, Excel, PDF, Word... Less StructuredMore Structured Multiple protocols, formats Linked data services query, search, browse Request/Reply, event driven Secure delivery Library of wrappers Web automation Any data or content Read & Write DATA VIRTUALIZATION DATA CONSUMERSAnalytical Operational CONNECT COMBINE CONSUME Share, Deliver, Publish, Govern, Collaborate Discover, Transform, Prepare, Improve Quality, Integrate Normalized views of disparate data Agile Development Performance Resource Management Lifecycle Management Data Services Data Catalog Governance & Metadata Security & Data Privacy Denodo Data Virtualization Architecture
  • 7. 6 Modern Data Architectures are much more complex than the architectures of just 10 years ago Replicating (copying) data into a central repository doesn’t work at scale or complexity needed today Data Virtualization can provide access to all of your data, in real-time, and supporting self-service with a common data model (in the context of the business users) Let’s find out how…
  • 8. Logical Data Warehouse “The Logical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.” 7 - Gartner Hype Cycle for Enterprise Information Management, 2012
  • 9. 8 Data Warehouse + Cloud Dimensional Data Time Dimension Fact table (sales) Product Dimension Customer Dimension CRM SFDC Customer EDW
  • 10. 9 Multiple Data Warehouse Integration Time Dimension Sales fact Product Dimension Region Finance EDW City Marketing EDW Customer Fidelity facts Product Dimension *Real Examples: Nationwide POC, IBM tests Store
  • 11. 10 Horizontal Partitioning Data Warehouse Historical Offloading Time Dimension Fact table (sales) Product Dimension Retailer Dimension Current Sales Historical Sales EDW
  • 12. 11 Providing access to integrated data in real time Big Data Analytics Framework Benefits § Enhanced insight across the business without physically moving data § Simplified data consumption with a single endpoint for all data access § Faster integration of new data sources § Smarter decision making via additional information-enrichment capabilities § Increased speed and agility of both business and IT, significantly increasing customer satisfaction
  • 13. 12 Logical Data Warehouse at Autodesk Benefits § For the first time, Autodesk can do single-point security enforcement and have uniform data environment for access. § Reduced replication of data with less use of ETL processes § Single point of enforcement for security § Uniform environment for data access in place § Development flexibility to understand what is needed to build before actually building
  • 14. 13 Summary The Logical Data Warehouse (LDW) is an evolution and augmentation of DW practices, not a replacement A repository-only style data warehouse contains a single ontology/ taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information The LDW permits an IT organization to make a large number of datasets available for analysis via query tools and applications
  • 15. Query Optimization in the Logical Data Warehouse 14
  • 16. 15 - Gartner, Magic Quadrant for Data Integration, 2017 The Denodo Platform ... incorporates dynamic query optimization as a key value point. This capability includes support for cost-based optimization specifically for high data volume and complexity;... it has also added an in-memory data grid with Massively Parallel Processing(MPP) architecture to its platform.
  • 17. 16 Obtain Total Sales By Customer Country in the Last Two Years Query Optimization: Example (1) Naive Strategy (BI Tools, BDI Tools, Simple federation engines): join union group by Customers (3M) Sales previous years (38) Sales this year (290M) 290M rows 300M rows (sales previous year) 3M rows 593M rows through the network System Execution Time Optimization Technique No Rewriting 20 min None
  • 18. 17 Obtain Total Sales By Customer Country in the Last Two Years Query Optimization: Example (2) Denodo Strategy – Aggregation push-down join union group by Customers (3M) Sales previous years (3B) Sales this year (290M) 3M rows (sales by customer this year) 3M rows (sales by customer previous year) 3M rows 9 M rows through the network group by customer group by customer System Execution Time Optimization Technique No Rewriting 20 min None Denodo 6 51 sec Aggregation push-down
  • 19. 18 Obtain Total Sales By Customer Country in the Last Two Years Query Optimization: Example (3) union group by 3M rows (sales by customer this year) 3M rows (sales by customer previous year) 3M rows (customers) Aggregation pushdown group by customer group by customer join Integrated MPP processing System Execution Time Optimization Technique No Rewriting 20 min None Denodo 6 51 sec Aggregation push-down Denodo 7 13 sec Aggregation push-down + MPP integration Customers (3M)
  • 20. 19 You can achieve excellent performance in Logical Analytic Architectures Key techniques needed: Advanced Dynamic Optimization to minimize network traffic and leverage the power of data sources In-memory MPP processing to speed operations at the Data Virtualization layer Advanced incremental caching for reusing commonly used data and complex calculations
  • 21. Security and Governance with a Data Abstraction Layer
  • 22. 21 Different Data Sources – Different Security Models Databases/EDW – Mature RBAC model Hadoop – Kerberos § Cloudera – Apache Sentry and Knox § Hortonworks – Apache Ranger and Atlas Cloud – OAuth 2.0 (?) Files – Binary – Read access or none Web Services – Multiple models In many cases, the consumer has to deal with these different security models and technologies
  • 23. 22 Abstracting Data Source Security Provide single data model to consumers § Role-based Access to data on need basis § Removes data silo security Hide complexity of different security models and maturities Integrate with existing authentication system (e.g. LDAP/AD) Single point for monitoring/auditing § Who, what, when, how, … Ensure compliance with corporate policies Data access and privacy rules enforced ‘on the fly’
  • 24. 23 Security in a Hybrid Environment Moving data to Cloud can exacerbate security and privacy problems SaaS and Cloud data sources often have different security models Not integrated to corporate authentication mechanisms Potential for recreating authentication model in Cloud Data Virtualization abstraction layer means Cloud sources can use same security mechanism and access controls as on premise sources
  • 25. 24 Customer Use Case - Asurion International Expansion - moving into different privacy and data protection jurisdictions New products – need for different data types and sources § Mixing structured, multi-structured, streaming, text, video, voice, geo- location, etc. Moving to Cloud for increased speed and agility § Easier to spin up new virtual servers for new data sets Competing pressures for securing data and providing access to data sets Security Constraints Geographical Constraints Contractual Client Obligations PII Protection Departmental Restrictions Fast Changing Hadoop & Cloud Technologies Hive, Spark, Redshift Maintaining different code base Discover, Co-relate, Enable Predictive Analytics Text, CSV, Voice, JSON, Streaming, 3rd Party Data 60TB+ structured, 200TB+ telemetry & unstructured data
  • 26. 25 Asurion – Hybrid Architecture After implementing hybrid Data Virtualization layer, Asurion was able to: § Control security across entire infrastructure from a single point § Easily meet regional security and privacy requirements § Keep client data separate as contractually required – but allow analytics over all (anonymized) data § Perform complete audits of data access, as needed § Quickly add new, compliant data sources to system
  • 27. 26 Governance… Governance features are pervasive in Denodo Platform: § Users can inspect catalog of virtualization objects through catalog search to find data combinations for reuse § Data lineage helps users to understand where data has come from and how it has changed from the source § Impact analysis helps architects understand the consequences of changes in the data source schemas § Propagate changes selectively with a single click.
  • 28. 27 Data Lineage Graphical view for showing data lineage for any field in any virtual view. Trace source of any field: • Includes any functions applied to field contents. Trace source of calculated fields: • View calculations used to create new fields.
  • 29. 28 “Used by” “Big picture” view of usage Useful for seeing impact of changes on whole system
  • 30. 29 Single point for security and governance Extends single point of control across Cloud and on premise architectures catalog search helps users find data combinations for reuse Data lineage helps users to understand where data has come from and how it has changed from the source Impact analysis helps architects understand the consequences of change
  • 32. 31 Self-Service Challenges……. Tools are designed for data analysts (or power users) § Users who are happy finding, wrangling, cleansing data § Creating calculations, aggregations within the data What about the other business users? § People who don’t want to spend hours fighting the spreadsheet… Will they use common definitions for key business entities and metrics? § Or will they pick and choose their own? Ultimately, can you trust the numbers? § Where did the data come from? § How has is been manipulated? 31
  • 33. 32 Self-Service with Guardrails Don’t build just for the ‘data cowboys’ Create a common and consistent semantic layer § Everyone is using the same definitions and metrics Create pre-integrated, pre-calculated data services § Save the user having to do this themselves § Ensures consistency of calculations, etc. But allow the cowboys to ‘roam and wrangle’ § Even the cowboys can only access ‘approved’ data sources
  • 35. 34 Logical Data Warehouse Improves Information Agility Benefits § Diverse data spread across the entire enterprise can now be accessed instantaneously and securely with a proper authorization structure § Core Business Intelligence logic is becoming centralized, reducing duplication of effort and enhancing development efficiency § Searchable data dictionary helps report writers find the data they need and help improve the self- service experience
  • 36. 35 “Get it Real-time and Get it Fast!” The Benefits of Data Virtualization Complete enterprise information, combining Web, cloud, streaming, and structured data ROI realization within 6 months, with the flexibility to adjust to unforeseen changes An 80% reduction in integration costs, in terms of resources and technology Real-time integration and data access, enabling faster business decisions
  • 37. 36 Denodo The Leader in Data Virtualization DENODO OFFICES, CUSTOMERS, PARTNERS Palo Alto, CA. Global presence throughout North America, EMEA, APAC, and Latin America LEADERSHIP ▪ Longest continuous focus on data virtualization – since 1999 ▪ Leader in 2018 Forrester Wave – Big Data Fabric ▪ Winner of numerous awards CUSTOMERS ~500 customers, including many F500 and G2000 companies across every major industry have gained significant business agility and ROI DENODO AUSTRALIA L13 – Macquarie House, 167 Macquarie Street NSW Sydney 2000