SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved. 1
MODERN DATA WAREHOUSE
FUNDAMENTALS
Part I: Introducing the Modern Data Warehouse - Challenges, Use Cases, and
Opportunities
December, 2018
© Cloudera, Inc. All rights reserved. 3
SPEAKERS
Eva Nahari
Director, Product
Management
eva.nahari@cloudera.com
David Dichmann
Director, Product Marketing
ddichmann@cloudera.com
Why Modernize Your Data Warehouse?
The Case for a Modern Data Warehouse
5 © Cloudera, Inc. All rights reserved.
LARGE NORTH AMERICAN BANK
• LoB Data Analysts
access all data
• Saved $4M+ in
deposit fraud
Terabytes
Users
Databases
Queries / Month
FRAUD PREVENTION
6 © Cloudera, Inc. All rights reserved.
GLOBAL PHARMACEUTICAL
• Curated Use and
Agile Discovery with
HIPAA compliance
• Accelerated new
Drug Development
Use Cases
Users
Fewer Silos
Diverse Data
NEW PRODUCT
DEVELOPMENT
7 © Cloudera, Inc. All rights reserved.
MAJOR TELCO MANUFACTURER
• $10 M new revenue
from optimized
marketing
• $30 M+ from Price
Optimization
• $100K+ from
weather correlationQuery
Responses
New Sources
Min. Data Sets
Users
BUSINESS
OPTIMIZATION
© Cloudera, Inc. All rights reserved. 8
NEW TRENDS IN DATA WAREHOUSING
Deeper Business Insights at Extreme Speed and Scale While Managing Cost
DEEPER
business insights
EXTREME
speed & scale
CONTROLLED
resources & costs
© Cloudera, Inc. All rights reserved. 9
NEW TRENDS IN DATA WAREHOUSING
Deeper Business Insights
Protect
● Proactive Fraud Prevention
● Keep up with Regulatory
Compliance
● Preempt Cyberthreats
Real-time response on
massive data volume
and variety
Optimize
● Improve Operational
Efficiency
● Support Internet of Things
(IoT)
New analytics techniques
democratized to all users
Grow
● Customer Sentiment
● Fault Prevention
● Improve Product Quality
● New Revenue Streams
Experimentation and
collaboration at scale
© Cloudera, Inc. All rights reserved. 10
NEW TRENDS IN DATA WAREHOUSING
Extreme Speed and Scale
More Data
● Massive amounts handled
faster at scale
● More variety from new
sources (social media, IoT)
● Insight within minutes of
new data arrival
Performance and
flexibility at scale
More Workloads
● 100’s of production grade
deployments
● Enterprise grade
dependability
● Strict security and
governance
On-demand scale out,
discovery, collaboration
More People
● 1,000’s of new users and
new user types
● 1,000’s of new use cases
● All skill levels: Analytics,
Data Science, and Machine
Learning
All workloads with a
shared data experience
© Cloudera, Inc. All rights reserved. 11
NEW TRENDS IN DATA WAREHOUSING
Managing Resources and Costs
Optimize Core Processes
● Automation to reduce
pressure on organizational
bottlenecks
● Consistent user experience
Broaden data reach
without increasing IT
burden or costs
Self-Service Everything
● Resource provisioning
● Workload development
● Optimizing and
troubleshooting
Deliver on increased
SLA pressures without
runaway cost
Dynamic Consumption
● Transient Workloads
● Short-lived Workloads
● Permanent Workloads
● Public, Private, Hybrid Cloud
Environmental flexibility
and adaptive compute,
storage
© Cloudera, Inc. All rights reserved. 12
Quickly enable business analytics by sharing petabytes of verified data
across thousands of users while surpassing demands of SLAs and costs
13 © Cloudera, Inc. All rights reserved.
TRADITIONAL DATA WAREHOUSE:
Structured Data
Sources
(ERP, CRM, SCM)
Transformations
EDW
Advanced
Analytics
Dashboards
Ad Hoc
Canned
Reports
Staging
Data Marts
Many Months
Master Schema
ETLODS
2 3
4
1 5
Struggle to handle volume
and variety
Limited
access
14 © Cloudera, Inc. All rights reserved.
WHAT CONCEPTS SURVIVE?
Data Modeling Security & Governance Reports & Dashboards
15 © Cloudera, Inc. All rights reserved.
WHAT HAS CHANGED?
Traditional DW Modern DW
Supporting Role Foundational Role
Primarily Internal Internal & External
Constrained, Structured
Freeform,
Multi-Structured
Planned ETLs On-Demand Pipelines
Users
Data Exploration
Data Curation
Data & Analytics
16 © Cloudera, Inc. All rights reserved.
WHAT IS NEW?
Experimentation
& Collaboration
Dynamic Consumption Self Service
Everything
17 © Cloudera, Inc. All rights reserved.
MODERN DATA WAREHOUSE
Advanced
Analytics
Dashboards
Ad Hoc
Canned
Reports
Data Store
Within Days
Data Marts
1
2
Ingest & Store all data
at scale
Self-serve / On-
demand
Variety of data
sources/types
18 © Cloudera, Inc. All rights reserved.
CLOUDERA MODERN DATA WAREHOUSE
The modern platform for machine learning and analytics optimized for the cloud
Amazon S3
Microsoft
ADLS HDFS KUDU
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
DATA CATALOG
Core
Services
Storage
Services
ANALYTICSDATA
SCIENCE
EXTENSIBLE
SERVICES
OPERATIONAL
DATABASE
DATA ENGINEERING
19 © Cloudera, Inc. All rights reserved.
Preferred BI & ELT ToolsSQL Workbench
Workload
XM
Navigator
& Sentry
Impala
MPP Query Engine
Hive-on-Spark / Spark
MPP ELT Processing
KUDU | HDFS
Local Storage
AWS S3 | ADLS
Object Storage
Shared Data Experience (SDX)
Optimized File Formats (Parquet, Avro)
Solr
MPP Search Analytics
Cloudera
Manager
HYBRID
Controls
HYBRID
Compute
HYBRID
Storage
A MODERN DATA WAREHOUSE SOLUTION
Altus
20 © Cloudera, Inc. All rights reserved.
Proactively Optimize Workloads
WORKLOAD XM
Self Serve Diagnostics and Optimizations
Self Serve Analytics Workbench
Move faster
Serve more users
Reduce IT pressure
21 © Cloudera, Inc. All rights reserved.
EXTREME SPEED & SCALE
Fastest ELT
at Scale
for Data Engineers
Fastest Self-Service BI
at Scale
for Analysts & Developers
Impala
Flexibility at scale
1000s of users
On-demand scale out
Speed to insight
22 © Cloudera, Inc. All rights reserved.
EXPLORE
Discovery
(raw)
EXPERIMENT
Exploration
(curated)
EMERGING LOB
Prep - New
Report
SALES
BI/New
Reporting
EXPERIMENT
Model
Build/Test
DEV & TEST
Prep –
Known
FINANCE
Regular
Reporting
Shared Storage (HDFS, KUDU, S3, ADLS)
Shared Metadata, Security, Governance
Landing Zone Experimental Zone Archived ZoneRefined Zone
ON-DEMAND SCALING & MULTI-TENANCY
23 © Cloudera, Inc. All rights reserved.
Stateful Context, Shared Experience
ENABLES FULL FLEXIBILITY AND DYNAMIC CONSUMPTION
Confidential-Restricted – For Discussion Purposes Only24 © Cloudera, Inc. All rights reserved.
CLOUD NATIVE OPTION - ALTUS DW
● Quick time to value - no software or
clusters to manage
● Bring warehouse to the data with zero
copy simplicity
● Use your security policies with your
data - no proprietary stacks
● Apply enterprise governance to
transient workloads
● Shared data experience with SDX
● Optimized for Azure & AWS
DATA WAREHOUSE
GOVERNANCESECURITY
ALTUS CONTROL
PLANE
LIFECYCLE
MANAGEMENT
MULTI-CLOUD
Amazon
S3
Microsoft
ADLS
MULTI-CLOUD PAAS SOLUTION
25 © Cloudera, Inc. All rights reserved.
Moving from Known Questions on Known Data to Unknown Questions on Unknown Data
FROM ANALYTICS TO MACHINE LEARNING
25
DATA
ENGINEERING
DATA
WAREHOUSE
+
+
● Run ETL with Spark or partner tools to ingest
and process data at any scale
● Assign permissions and classifications once
● Data, along with all data context, is
immediately available in the data warehouse
for analytical processing and BI use cases
● Run data science and machine learning
analysis to blend, augment, and score data
● Blended and augmented data, along with all
data context, is immediately available to to
business teams and analysts with unified
security and governance
DATA
WAREHOUSE
DATA
SCIENCE
Cloudera SDX makes it easy for
administrators, BI users, data
scientists to work together on a
common data set, with consistent
data context
BETTER
TOGETHER
26 © Cloudera, Inc. All rights reserved.
TOOLS & FRAMEWORKS FOR SUCCESS
Plan Offload
(Optional)
Optimize
Estimate Effort
Risk Analysis
Schema Design
Test & Validate
Evaluate
Identify Use Cases Impact Analysis
Set Objectives
Prioritized Plan
Initial POC
Identify Suitable
Workloads
Offload Actions
Capacity Planning
Fine Tuning Data
Model on Hadoop
Optimize Queries for
Performance
Validate ROI, Cost
27 © Cloudera, Inc. All rights reserved.
TD BANK: Delivering “Legendary Customer Experience”
CHALLENGES
Significantly improve customer
experience with sentiment
analysis, behavioral patterns,
and predictive modeling
Current system couldn’t handle:
• Centralizing data from
thousands of sources
• Demands from increased
users and use cases
• Data cost and manageability
at scale
RESULTS
• 30% reduction in repeat
customer complaints
• 90% productivity
improvement for analytics
projects
• 60% decrease in data
management costs
• 98% decrease in per TB
storage costs
SOLUTION
Modern Data Warehouse for
customer marketing, fraud
analytics and cybersecurity
• Ingest data from 100+
corporate systems
• Centralized data into “the
hands of those that need it
much more quickly”
• Significantly reduce storage
and management costs
https://www.cloudera.com/more/customers/td-bank.html
28 © Cloudera, Inc. All rights reserved.
DEUTSCHE TELEKOM: Fraud reduction and customer retention
CHALLENGES
Improve fraud detection speed
to near-real time and respond
to network service quality
issues before customers notice
Current system couldn’t handle:
• Massive volumes of network
data - at higher granularity
• Enterprise view of data -
machine learning at scale
• Near-real time fraud
detection on incoming data
RESULTS
• 10-20% reduction in revenue
loss by increased fraud
detection
• 5-10% decrease in customer
churn with increased
network quality
• 50% increase in overall
operational efficiencies with
faster analytics
SOLUTION
Modern Data Warehouse to
detect fraud patterns and
network problems in real-time
before business impact
• Quickly analyze massive
streaming data sets
• Enterprise grade reliability
and stability with shared
data experience (no silos)
• Machine learning and fast
analytics - real-time
https://www.cloudera.com/more/customers/deutsche-telekom.html
29 © Cloudera, Inc. All rights reserved.
KOMATSU MINING: Optimize Machine Performance
CHALLENGES
Create an Industrial IoT (IIoT)
solution for optimizing mining
equipment utility and build
better next-generation products
Current system couldn’t handle:
• Scale of IoT data
• Demand for new users and
use cases
• 30TB/month data growth
RESULTS
• 2X Increase in production
hours on key equipment
• Design next-generation
equipment: environmentally
smarter, more productive, at
lower cost
• Meet or exceed all KPIs:
“Deliver all of the data with
less complexity and
significant cost savings”
SOLUTION
Cloud-based IIoT analytics for a
full view of mining operations
• Quickly and easily analyze
huge volume and variety
(time-series, sensor, event,
and more) of data
• More use cases and users:
“democratizing analytics for
different user groups”
• Scale quickly and easily in
the cloud
https://www.cloudera.com/more/news-and-blogs/press-releases/2017-11-15-komatsu-helps-improve-mining-performance.html
30 © Cloudera, Inc. All rights reserved.
CLOUDERA DW - PARTING THOUGHTS
Hybrid Optimized Shared Data ExperiencePerformance @Scale
Shared Data
Exponential Use Cases, Successful Outcomes
THANK YOU
https://www.cloudera.com/products/data-warehouse.html
© Cloudera, Inc. All rights reserved. 32

More Related Content

What's hot (20)

PPTX
Cloudera SDX
Cloudera, Inc.
 
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
PPTX
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
PPTX
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
PPTX
Cloudera - The Modern Platform for Analytics
Cloudera, Inc.
 
PPTX
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
PPTX
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 
PPTX
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloudera, Inc.
 
PPTX
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
 
PPTX
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
PPTX
How komatsu is driving operational efficiencies using io t and machine learni...
Cloudera, Inc.
 
PPTX
Driving Better Products with Customer Intelligence

Cloudera, Inc.
 
PPTX
Big Data Fundamentals
Cloudera, Inc.
 
PPTX
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
 
PPTX
Customer Best Practices: Optimizing Cloudera on AWS
Cloudera, Inc.
 
PPTX
PaaS or Fail: Rule the Cloud with Altus
Cloudera, Inc.
 
PPTX
The Transformation of your Data in modern IT (Presented by DellEMC)
Cloudera, Inc.
 
Cloudera SDX
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Cloudera - The Modern Platform for Analytics
Cloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Cloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloudera, Inc.
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
 
Turning Data into Business Value with a Modern Data Platform
Cloudera, Inc.
 
How komatsu is driving operational efficiencies using io t and machine learni...
Cloudera, Inc.
 
Driving Better Products with Customer Intelligence

Cloudera, Inc.
 
Big Data Fundamentals
Cloudera, Inc.
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Cloudera, Inc.
 
Customer Best Practices: Optimizing Cloudera on AWS
Cloudera, Inc.
 
PaaS or Fail: Rule the Cloud with Altus
Cloudera, Inc.
 
The Transformation of your Data in modern IT (Presented by DellEMC)
Cloudera, Inc.
 

Similar to Modern Data Warehouse Fundamentals Part 1 (20)

PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
PPTX
Data Warehouse Optimization
Cloudera, Inc.
 
PPTX
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
 
PPTX
Hadoop and Manufacturing
Cloudera, Inc.
 
PPTX
When SAP alone is not enough
Cloudera, Inc.
 
PDF
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
Excelerate Systems
 
PPTX
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
PPTX
Breakout: Data Discovery with Hadoop
Cloudera, Inc.
 
PPTX
The Journey to Success with Big Data
Cloudera, Inc.
 
PPTX
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Cloudera, Inc.
 
PPTX
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
PPTX
Breakout: Hadoop and the Operational Data Store
Cloudera, Inc.
 
PPTX
Making Self-Service BI a Reality in the Enterprise
Cloudera, Inc.
 
PPTX
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PDF
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
PPTX
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
PDF
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
TheInevitableCloud
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
 
Data Warehouse Optimization
Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
 
Hadoop and Manufacturing
Cloudera, Inc.
 
When SAP alone is not enough
Cloudera, Inc.
 
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
Excelerate Systems
 
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
Breakout: Data Discovery with Hadoop
Cloudera, Inc.
 
The Journey to Success with Big Data
Cloudera, Inc.
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
Cloudera, Inc.
 
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
 
Breakout: Hadoop and the Operational Data Store
Cloudera, Inc.
 
Making Self-Service BI a Reality in the Enterprise
Cloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
TheInevitableCloud
 
Ad

More from Cloudera, Inc. (13)

PPTX
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
PPTX
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
PPTX
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
PPTX
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
PPTX
How Cloudera SDX can aid GDPR compliance
Cloudera, Inc.
 
PDF
Multi task learning stepping away from narrow expert models 7.11.18
Cloudera, Inc.
 
PPTX
Cloudera training secure your cloudera cluster 7.10.18
Cloudera, Inc.
 
PPTX
The 5 Biggest Data Myths in Telco: Exposed
Cloudera, Inc.
 
PPTX
Delivering improved patient outcomes through advanced analytics 6.26.18
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
How Cloudera SDX can aid GDPR compliance
Cloudera, Inc.
 
Multi task learning stepping away from narrow expert models 7.11.18
Cloudera, Inc.
 
Cloudera training secure your cloudera cluster 7.10.18
Cloudera, Inc.
 
The 5 Biggest Data Myths in Telco: Exposed
Cloudera, Inc.
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Cloudera, Inc.
 
Ad

Recently uploaded (20)

PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 

Modern Data Warehouse Fundamentals Part 1

  • 1. © Cloudera, Inc. All rights reserved. 1
  • 2. MODERN DATA WAREHOUSE FUNDAMENTALS Part I: Introducing the Modern Data Warehouse - Challenges, Use Cases, and Opportunities December, 2018
  • 3. © Cloudera, Inc. All rights reserved. 3 SPEAKERS Eva Nahari Director, Product Management [email protected] David Dichmann Director, Product Marketing [email protected]
  • 4. Why Modernize Your Data Warehouse? The Case for a Modern Data Warehouse
  • 5. 5 © Cloudera, Inc. All rights reserved. LARGE NORTH AMERICAN BANK • LoB Data Analysts access all data • Saved $4M+ in deposit fraud Terabytes Users Databases Queries / Month FRAUD PREVENTION
  • 6. 6 © Cloudera, Inc. All rights reserved. GLOBAL PHARMACEUTICAL • Curated Use and Agile Discovery with HIPAA compliance • Accelerated new Drug Development Use Cases Users Fewer Silos Diverse Data NEW PRODUCT DEVELOPMENT
  • 7. 7 © Cloudera, Inc. All rights reserved. MAJOR TELCO MANUFACTURER • $10 M new revenue from optimized marketing • $30 M+ from Price Optimization • $100K+ from weather correlationQuery Responses New Sources Min. Data Sets Users BUSINESS OPTIMIZATION
  • 8. © Cloudera, Inc. All rights reserved. 8 NEW TRENDS IN DATA WAREHOUSING Deeper Business Insights at Extreme Speed and Scale While Managing Cost DEEPER business insights EXTREME speed & scale CONTROLLED resources & costs
  • 9. © Cloudera, Inc. All rights reserved. 9 NEW TRENDS IN DATA WAREHOUSING Deeper Business Insights Protect ● Proactive Fraud Prevention ● Keep up with Regulatory Compliance ● Preempt Cyberthreats Real-time response on massive data volume and variety Optimize ● Improve Operational Efficiency ● Support Internet of Things (IoT) New analytics techniques democratized to all users Grow ● Customer Sentiment ● Fault Prevention ● Improve Product Quality ● New Revenue Streams Experimentation and collaboration at scale
  • 10. © Cloudera, Inc. All rights reserved. 10 NEW TRENDS IN DATA WAREHOUSING Extreme Speed and Scale More Data ● Massive amounts handled faster at scale ● More variety from new sources (social media, IoT) ● Insight within minutes of new data arrival Performance and flexibility at scale More Workloads ● 100’s of production grade deployments ● Enterprise grade dependability ● Strict security and governance On-demand scale out, discovery, collaboration More People ● 1,000’s of new users and new user types ● 1,000’s of new use cases ● All skill levels: Analytics, Data Science, and Machine Learning All workloads with a shared data experience
  • 11. © Cloudera, Inc. All rights reserved. 11 NEW TRENDS IN DATA WAREHOUSING Managing Resources and Costs Optimize Core Processes ● Automation to reduce pressure on organizational bottlenecks ● Consistent user experience Broaden data reach without increasing IT burden or costs Self-Service Everything ● Resource provisioning ● Workload development ● Optimizing and troubleshooting Deliver on increased SLA pressures without runaway cost Dynamic Consumption ● Transient Workloads ● Short-lived Workloads ● Permanent Workloads ● Public, Private, Hybrid Cloud Environmental flexibility and adaptive compute, storage
  • 12. © Cloudera, Inc. All rights reserved. 12 Quickly enable business analytics by sharing petabytes of verified data across thousands of users while surpassing demands of SLAs and costs
  • 13. 13 © Cloudera, Inc. All rights reserved. TRADITIONAL DATA WAREHOUSE: Structured Data Sources (ERP, CRM, SCM) Transformations EDW Advanced Analytics Dashboards Ad Hoc Canned Reports Staging Data Marts Many Months Master Schema ETLODS 2 3 4 1 5 Struggle to handle volume and variety Limited access
  • 14. 14 © Cloudera, Inc. All rights reserved. WHAT CONCEPTS SURVIVE? Data Modeling Security & Governance Reports & Dashboards
  • 15. 15 © Cloudera, Inc. All rights reserved. WHAT HAS CHANGED? Traditional DW Modern DW Supporting Role Foundational Role Primarily Internal Internal & External Constrained, Structured Freeform, Multi-Structured Planned ETLs On-Demand Pipelines Users Data Exploration Data Curation Data & Analytics
  • 16. 16 © Cloudera, Inc. All rights reserved. WHAT IS NEW? Experimentation & Collaboration Dynamic Consumption Self Service Everything
  • 17. 17 © Cloudera, Inc. All rights reserved. MODERN DATA WAREHOUSE Advanced Analytics Dashboards Ad Hoc Canned Reports Data Store Within Days Data Marts 1 2 Ingest & Store all data at scale Self-serve / On- demand Variety of data sources/types
  • 18. 18 © Cloudera, Inc. All rights reserved. CLOUDERA MODERN DATA WAREHOUSE The modern platform for machine learning and analytics optimized for the cloud Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services ANALYTICSDATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING
  • 19. 19 © Cloudera, Inc. All rights reserved. Preferred BI & ELT ToolsSQL Workbench Workload XM Navigator & Sentry Impala MPP Query Engine Hive-on-Spark / Spark MPP ELT Processing KUDU | HDFS Local Storage AWS S3 | ADLS Object Storage Shared Data Experience (SDX) Optimized File Formats (Parquet, Avro) Solr MPP Search Analytics Cloudera Manager HYBRID Controls HYBRID Compute HYBRID Storage A MODERN DATA WAREHOUSE SOLUTION Altus
  • 20. 20 © Cloudera, Inc. All rights reserved. Proactively Optimize Workloads WORKLOAD XM Self Serve Diagnostics and Optimizations Self Serve Analytics Workbench Move faster Serve more users Reduce IT pressure
  • 21. 21 © Cloudera, Inc. All rights reserved. EXTREME SPEED & SCALE Fastest ELT at Scale for Data Engineers Fastest Self-Service BI at Scale for Analysts & Developers Impala Flexibility at scale 1000s of users On-demand scale out Speed to insight
  • 22. 22 © Cloudera, Inc. All rights reserved. EXPLORE Discovery (raw) EXPERIMENT Exploration (curated) EMERGING LOB Prep - New Report SALES BI/New Reporting EXPERIMENT Model Build/Test DEV & TEST Prep – Known FINANCE Regular Reporting Shared Storage (HDFS, KUDU, S3, ADLS) Shared Metadata, Security, Governance Landing Zone Experimental Zone Archived ZoneRefined Zone ON-DEMAND SCALING & MULTI-TENANCY
  • 23. 23 © Cloudera, Inc. All rights reserved. Stateful Context, Shared Experience ENABLES FULL FLEXIBILITY AND DYNAMIC CONSUMPTION
  • 24. Confidential-Restricted – For Discussion Purposes Only24 © Cloudera, Inc. All rights reserved. CLOUD NATIVE OPTION - ALTUS DW ● Quick time to value - no software or clusters to manage ● Bring warehouse to the data with zero copy simplicity ● Use your security policies with your data - no proprietary stacks ● Apply enterprise governance to transient workloads ● Shared data experience with SDX ● Optimized for Azure & AWS DATA WAREHOUSE GOVERNANCESECURITY ALTUS CONTROL PLANE LIFECYCLE MANAGEMENT MULTI-CLOUD Amazon S3 Microsoft ADLS MULTI-CLOUD PAAS SOLUTION
  • 25. 25 © Cloudera, Inc. All rights reserved. Moving from Known Questions on Known Data to Unknown Questions on Unknown Data FROM ANALYTICS TO MACHINE LEARNING 25 DATA ENGINEERING DATA WAREHOUSE + + ● Run ETL with Spark or partner tools to ingest and process data at any scale ● Assign permissions and classifications once ● Data, along with all data context, is immediately available in the data warehouse for analytical processing and BI use cases ● Run data science and machine learning analysis to blend, augment, and score data ● Blended and augmented data, along with all data context, is immediately available to to business teams and analysts with unified security and governance DATA WAREHOUSE DATA SCIENCE Cloudera SDX makes it easy for administrators, BI users, data scientists to work together on a common data set, with consistent data context BETTER TOGETHER
  • 26. 26 © Cloudera, Inc. All rights reserved. TOOLS & FRAMEWORKS FOR SUCCESS Plan Offload (Optional) Optimize Estimate Effort Risk Analysis Schema Design Test & Validate Evaluate Identify Use Cases Impact Analysis Set Objectives Prioritized Plan Initial POC Identify Suitable Workloads Offload Actions Capacity Planning Fine Tuning Data Model on Hadoop Optimize Queries for Performance Validate ROI, Cost
  • 27. 27 © Cloudera, Inc. All rights reserved. TD BANK: Delivering “Legendary Customer Experience” CHALLENGES Significantly improve customer experience with sentiment analysis, behavioral patterns, and predictive modeling Current system couldn’t handle: • Centralizing data from thousands of sources • Demands from increased users and use cases • Data cost and manageability at scale RESULTS • 30% reduction in repeat customer complaints • 90% productivity improvement for analytics projects • 60% decrease in data management costs • 98% decrease in per TB storage costs SOLUTION Modern Data Warehouse for customer marketing, fraud analytics and cybersecurity • Ingest data from 100+ corporate systems • Centralized data into “the hands of those that need it much more quickly” • Significantly reduce storage and management costs https://www.cloudera.com/more/customers/td-bank.html
  • 28. 28 © Cloudera, Inc. All rights reserved. DEUTSCHE TELEKOM: Fraud reduction and customer retention CHALLENGES Improve fraud detection speed to near-real time and respond to network service quality issues before customers notice Current system couldn’t handle: • Massive volumes of network data - at higher granularity • Enterprise view of data - machine learning at scale • Near-real time fraud detection on incoming data RESULTS • 10-20% reduction in revenue loss by increased fraud detection • 5-10% decrease in customer churn with increased network quality • 50% increase in overall operational efficiencies with faster analytics SOLUTION Modern Data Warehouse to detect fraud patterns and network problems in real-time before business impact • Quickly analyze massive streaming data sets • Enterprise grade reliability and stability with shared data experience (no silos) • Machine learning and fast analytics - real-time https://www.cloudera.com/more/customers/deutsche-telekom.html
  • 29. 29 © Cloudera, Inc. All rights reserved. KOMATSU MINING: Optimize Machine Performance CHALLENGES Create an Industrial IoT (IIoT) solution for optimizing mining equipment utility and build better next-generation products Current system couldn’t handle: • Scale of IoT data • Demand for new users and use cases • 30TB/month data growth RESULTS • 2X Increase in production hours on key equipment • Design next-generation equipment: environmentally smarter, more productive, at lower cost • Meet or exceed all KPIs: “Deliver all of the data with less complexity and significant cost savings” SOLUTION Cloud-based IIoT analytics for a full view of mining operations • Quickly and easily analyze huge volume and variety (time-series, sensor, event, and more) of data • More use cases and users: “democratizing analytics for different user groups” • Scale quickly and easily in the cloud https://www.cloudera.com/more/news-and-blogs/press-releases/2017-11-15-komatsu-helps-improve-mining-performance.html
  • 30. 30 © Cloudera, Inc. All rights reserved. CLOUDERA DW - PARTING THOUGHTS Hybrid Optimized Shared Data ExperiencePerformance @Scale Shared Data Exponential Use Cases, Successful Outcomes
  • 32. © Cloudera, Inc. All rights reserved. 32