How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million

1©MapR Technologies. All rights reserved.
How One Company Offloaded Data
Warehouse ETL To Hadoop and
Saved $30 Million
Rob Rosen
Sr. Director, Americas Systems Engineering
MapR Technologies

MapR Overview
 Enterprise-grade platform for Hadoop
 Deployed at thousands of companies
– Including 12 of the Fortune 100
 MapR is the preferred analytics platform
– Hundreds of billions of events daily
– 90% of the world’s Internet population monthly
– $1 trillion in retail purchases annually

Arrival of Big Data Impacts Data Warehouse
Data
Warehouse
Volume
Variety
Velocity
Prohibitively expensive
storage costs
Inability to process
unstructured formats
Faster arrival and
processing needs

Top Concern for Big Data
Multiple data sources
Multiple technologies
Multiple copies of data
“Too many different types, sources, and formats of critical data”

The Hadoop Advantage
 Fueling an industry revolution by
providing infinite capability to
store and process Big Data
 Expanding analytics across
data types
 Compelling economics
– 20 to 100X more cost effective than
alternatives
Pioneered at

Important Drivers for Hadoop
 Data on compute drives efficiencies
and better analytics
 With Hadoop you don’t need to know
what questions to ask beforehand
 Simple algorithms on Big Data
outperform complex models
 Powerful ability to analyze
unstructured data

Hadoop is the Technology of Choice
for Big Data

Source Data
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
Batch ETL
Transactions,
OLTP, OLAP
Enterprise Data
Warehouse
Raw data or infrequently used data
consuming capacity
Batch windows hitting their limits
putting SLAs at risk
Databases and data warehouses are
exceeding their capacity too quickly
How Do You Lower and
Control Data Warehouse Costs?
Datamarts
ODS
Traditional Targets

Source Data Traditional Targets
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
Transactions,
OLTP, OLAP
Enterprise Data
Warehouse
Lower Data Management Costs
RDBMS
MDM

Bottom-Line Impact
Sensor Data
Web Logs
Hadoop
RDBMS
Benefits:
 Both structured and unstructured data
 Expanded analytics with MapReduce, NoSQL, etc.
DW
Query +
PresentETL + Long Term StorageETL + Long Term Storage
Solution Cost / Terabyte Hadoop Advantage
Hadoop $333
Teradata Warehouse Appliance $16,500 50x savings
Oracle Exadata $14,000 42x savings
IBM Netezza $10,000 30x savings

What is the Best Way to Deploy Hadoop?
vs.
• Highly available and fully
protected data
• Works with existing tools
• Real-time ingestion and
extraction
• Archive data from data
warehouse
Transitory Data Store
• No long-term scale
advantages
• Unprotected data
• ETL Tool focus
Permanent Data Store
Enterprise Data Hub

An Enterprise Data Hub
 Combine different data sources
 Minimize data movement
 One platform for analytics
Sales
SCM
CRM
Public
Web Logs
Production
Data
Sensor
DataClick
Streams
Location
Social
Media
Billing
Enterprise
Data Hub

Key Elements of Enterprise Data Hub
99.999% HA Data
Protection
Disaster
Recovery
Scalability
&
Performance
Enterprise
Integration
Multi-
tenancy
Enterprise-grade platform
for the long term
• Reliability to support
stringent SLAs
• Protection from data loss and
user or application errors
• Support business continuity
and meet recovery objectives

High Availability and Dependability
Reliable
Compute
Dependable
Storage
 Automated stateful failover
 Automated re-replication
 Self-healing from HW and SW
failures
 Load balancing
 Rolling upgrades
 No lost jobs or data
 99999s of uptime
• Business continuity with
snapshots and mirrors
• Recover to a point in time
• End-to-end check summing
• Strong consistency
• Data safe
• Mirror across sites to meet
Recovery Time Objectives

Enterprise Data Hub Supports
a Range of Applications
99.999%
HA
Data
Protection
Disaster
Recovery
Scalability
&
Performance
Enterprise
Integration
Multi-
tenancy
Batch Interactive Real-time
Self-healing
Instant
recovery
Snapshots for
point in time
recovery from
user or
application
errors
Unlimited files
& tables
Record setting
performance
Direct data
ingestion and
access
Fully compliant
ODBC access and
SQL-92 support
Mirroring
across clusters
and the WAN
Secure access to
multiple users
and groups

Business Impact
 Saved millions in TCO
 10x faster, 100x cheaper
 Maintain the same SLAs
 Implemented the change without impacting users
Summary

Q & A
Engage with us!
@mapr
mapr-
technologies
maprtech
MapR
maprtech
rrosen@maprtech.com

How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million (20)

More from DataWorks Summit (20)

Recently uploaded (20)

How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million

Editor's Notes