SlideShare a Scribd company logo
1©MapR Technologies. All rights reserved.
How One Company Offloaded Data
Warehouse ETL To Hadoop and
Saved $30 Million
Rob Rosen
Sr. Director, Americas Systems Engineering
MapR Technologies
2©MapR Technologies. All rights reserved.
MapR Overview
 Enterprise-grade platform for Hadoop
 Deployed at thousands of companies
– Including 12 of the Fortune 100
 MapR is the preferred analytics platform
– Hundreds of billions of events daily
– 90% of the world’s Internet population monthly
– $1 trillion in retail purchases annually
3©MapR Technologies. All rights reserved.
Arrival of Big Data Impacts Data Warehouse
Data
Warehouse
Volume
Variety
Velocity
Prohibitively expensive
storage costs
Inability to process
unstructured formats
Faster arrival and
processing needs
4©MapR Technologies. All rights reserved.
Top Concern for Big Data
Multiple data sources
Multiple technologies
Multiple copies of data
“Too many different types, sources, and formats of critical data”
5©MapR Technologies. All rights reserved.
The Hadoop Advantage
 Fueling an industry revolution by
providing infinite capability to
store and process Big Data
 Expanding analytics across
data types
 Compelling economics
– 20 to 100X more cost effective than
alternatives
Pioneered at
6©MapR Technologies. All rights reserved.
Important Drivers for Hadoop
 Data on compute drives efficiencies
and better analytics
 With Hadoop you don’t need to know
what questions to ask beforehand
 Simple algorithms on Big Data
outperform complex models
 Powerful ability to analyze
unstructured data
7©MapR Technologies. All rights reserved.
Hadoop is the Technology of Choice
for Big Data
8©MapR Technologies. All rights reserved.
Source Data
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
Batch ETL
Transactions,
OLTP, OLAP
Enterprise Data
Warehouse
Raw data or infrequently used data
consuming capacity
Batch windows hitting their limits
putting SLAs at risk
Databases and data warehouses are
exceeding their capacity too quickly
How Do You Lower and
Control Data Warehouse Costs?
Datamarts
ODS
Traditional Targets
9©MapR Technologies. All rights reserved.
Source Data Traditional Targets
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
Transactions,
OLTP, OLAP
Enterprise Data
Warehouse
Lower Data Management Costs
RDBMS
MDM
10©MapR Technologies. All rights reserved.
Bottom-Line Impact
Sensor Data
Web Logs
Hadoop
RDBMS
Benefits:
 Both structured and unstructured data
 Expanded analytics with MapReduce, NoSQL, etc.
DW
Query +
PresentETL + Long Term StorageETL + Long Term Storage
Solution Cost / Terabyte Hadoop Advantage
Hadoop $333
Teradata Warehouse Appliance $16,500 50x savings
Oracle Exadata $14,000 42x savings
IBM Netezza $10,000 30x savings
11©MapR Technologies. All rights reserved.
What is the Best Way to Deploy Hadoop?
vs.
• Highly available and fully
protected data
• Works with existing tools
• Real-time ingestion and
extraction
• Archive data from data
warehouse
Transitory Data Store
• No long-term scale
advantages
• Unprotected data
• ETL Tool focus
Permanent Data Store
Enterprise Data Hub
12©MapR Technologies. All rights reserved.
An Enterprise Data Hub
 Combine different data sources
 Minimize data movement
 One platform for analytics
Sales
SCM
CRM
Public
Web Logs
Production
Data
Sensor
DataClick
Streams
Location
Social
Media
Billing
Enterprise
Data Hub
13©MapR Technologies. All rights reserved.
Key Elements of Enterprise Data Hub
99.999% HA Data
Protection
Disaster
Recovery
Scalability
&
Performance
Enterprise
Integration
Multi-
tenancy
Enterprise-grade platform
for the long term
• Reliability to support
stringent SLAs
• Protection from data loss and
user or application errors
• Support business continuity
and meet recovery objectives
14©MapR Technologies. All rights reserved.
High Availability and Dependability
Reliable
Compute
Dependable
Storage
 Automated stateful failover
 Automated re-replication
 Self-healing from HW and SW
failures
 Load balancing
 Rolling upgrades
 No lost jobs or data
 99999s of uptime
• Business continuity with
snapshots and mirrors
• Recover to a point in time
• End-to-end check summing
• Strong consistency
• Data safe
• Mirror across sites to meet
Recovery Time Objectives
15©MapR Technologies. All rights reserved.
Enterprise Data Hub Supports
a Range of Applications
99.999%
HA
Data
Protection
Disaster
Recovery
Scalability
&
Performance
Enterprise
Integration
Multi-
tenancy
Batch Interactive Real-time
Self-healing
Instant
recovery
Snapshots for
point in time
recovery from
user or
application
errors
Unlimited files
& tables
Record setting
performance
Direct data
ingestion and
access
Fully compliant
ODBC access and
SQL-92 support
Mirroring
across clusters
and the WAN
Secure access to
multiple users
and groups
16©MapR Technologies. All rights reserved.
Business Impact
 Saved millions in TCO
 10x faster, 100x cheaper
 Maintain the same SLAs
 Implemented the change without impacting users
Summary
17©MapR Technologies. All rights reserved.
Q & A
Engage with us!
@mapr
mapr-
technologies
maprtech
MapR
maprtech
rrosen@maprtech.com

More Related Content

PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
PPTX
Big Data Maturity Scorecard
DataWorks Summit
 
PDF
Change Data Feed in Delta
Databricks
 
PDF
Etl overview training
Mondy Holten
 
PPTX
Apache Hive Tutorial
Sandeep Patil
 
PPTX
Introduction to Hadoop and Hadoop component
rebeccatho
 
PDF
A Reference Architecture for ETL 2.0
DataWorks Summit
 
PPTX
Big data architectures and the data lake
James Serra
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Big Data Maturity Scorecard
DataWorks Summit
 
Change Data Feed in Delta
Databricks
 
Etl overview training
Mondy Holten
 
Apache Hive Tutorial
Sandeep Patil
 
Introduction to Hadoop and Hadoop component
rebeccatho
 
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Big data architectures and the data lake
James Serra
 

What's hot (20)

PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
HDFS Analysis for Small Files
DataWorks Summit/Hadoop Summit
 
PDF
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
PDF
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
PPTX
Capture the Streams of Database Changes
confluent
 
PDF
Intro to Telegraf
InfluxData
 
PDF
Considerations for Data Access in the Lakehouse
Databricks
 
PDF
Data Lake,beyond the Data Warehouse
Data Science Thailand
 
PPTX
Data Lake Overview
James Serra
 
PPTX
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Caserta
 
PDF
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
 
PDF
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
PPTX
Hive: Loading Data
Benjamin Leonhardi
 
PDF
Optimizing Hive Queries
Owen O'Malley
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Databricks
 
PDF
3D: DBT using Databricks and Delta
Databricks
 
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
HDFS Analysis for Small Files
DataWorks Summit/Hadoop Summit
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
Capture the Streams of Database Changes
confluent
 
Intro to Telegraf
InfluxData
 
Considerations for Data Access in the Lakehouse
Databricks
 
Data Lake,beyond the Data Warehouse
Data Science Thailand
 
Data Lake Overview
James Serra
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Caserta
 
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
Hive: Loading Data
Benjamin Leonhardi
 
Optimizing Hive Queries
Owen O'Malley
 
Data Lakehouse Symposium | Day 4
Databricks
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Databricks
 
3D: DBT using Databricks and Delta
Databricks
 
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Free Training: How to Build a Lakehouse
Databricks
 
Ad

Viewers also liked (20)

PPTX
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
PPTX
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
PDF
Hadoop Integration into Data Warehousing Architectures
Humza Naseer
 
PPTX
Hadoop and Your Data Warehouse
Caserta
 
KEY
Large scale ETL with Hadoop
OReillyStrata
 
PPTX
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
PPTX
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Caserta
 
PPTX
IPSAS Implementation
International Federation of Accountants
 
PDF
Rolex Science: The Fake Signs (3)
Dindin Watoto
 
PPT
Google blogger 的架設與操作教學
Mike Lee
 
PPTX
Entrepreneurial Operating System (EOS): Model and Process
Traction Masters
 
PPTX
Best Practices for Software Product Development
Prof. Dr. Alexander Maedche
 
PDF
Marketing Automation with Direct Mail
Moderno Strategies
 
PPT
Technical architect kpi
tomjonhss
 
PDF
ETL tool evaluation criteria
Asis Mohanty
 
PPTX
Katangian ng wika
Mi L
 
PDF
Optimizing MapReduce Job performance
DataWorks Summit
 
PDF
Grolsch growing globally beer case study
Mustahid Ali
 
PPTX
Master Data Management methodology
Database Architechs
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Hadoop Integration into Data Warehousing Architectures
Humza Naseer
 
Hadoop and Your Data Warehouse
Caserta
 
Large scale ETL with Hadoop
OReillyStrata
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Caserta
 
Rolex Science: The Fake Signs (3)
Dindin Watoto
 
Google blogger 的架設與操作教學
Mike Lee
 
Entrepreneurial Operating System (EOS): Model and Process
Traction Masters
 
Best Practices for Software Product Development
Prof. Dr. Alexander Maedche
 
Marketing Automation with Direct Mail
Moderno Strategies
 
Technical architect kpi
tomjonhss
 
ETL tool evaluation criteria
Asis Mohanty
 
Katangian ng wika
Mi L
 
Optimizing MapReduce Job performance
DataWorks Summit
 
Grolsch growing globally beer case study
Mustahid Ali
 
Master Data Management methodology
Database Architechs
 
Ad

Similar to How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million (20)

PDF
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
PDF
Data Warehouse Evolution Roadshow
MapR Technologies
 
PDF
MapR Data Hub White Paper V2 2014
Erni Susanti
 
PPTX
Expect More from Hadoop
MapR Technologies
 
PPTX
Integrating Hadoop into your enterprise IT environment
MapR Technologies
 
PPTX
Driving Business Benefits with Hadoop
MapR Technologies
 
PPTX
Deutsche Telekom on Big Data
DataWorks Summit
 
PPTX
Powering the "As it Happens" Business
MapR Technologies
 
ODP
EDW and Hadoop
Tapio Vaattanen
 
PPTX
How Experian increased insights with Hadoop
Precisely
 
PDF
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
PDF
Hadoop and IDW - When_to_use_which
Dan TheMan
 
PDF
Key Considerations for Putting Hadoop in Production SlideShare
MapR Technologies
 
PPTX
Which data should you move to Hadoop?
Attunity
 
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
PPTX
Hadoop for Data Warehousing professionals
Edureka!
 
PPTX
Data Warehouse Offload
John Berns
 
PDF
Hadoop & Data Warehouse
Mohit Srivastava
 
PDF
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Data Warehouse Evolution Roadshow
MapR Technologies
 
MapR Data Hub White Paper V2 2014
Erni Susanti
 
Expect More from Hadoop
MapR Technologies
 
Integrating Hadoop into your enterprise IT environment
MapR Technologies
 
Driving Business Benefits with Hadoop
MapR Technologies
 
Deutsche Telekom on Big Data
DataWorks Summit
 
Powering the "As it Happens" Business
MapR Technologies
 
EDW and Hadoop
Tapio Vaattanen
 
How Experian increased insights with Hadoop
Precisely
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Hadoop and IDW - When_to_use_which
Dan TheMan
 
Key Considerations for Putting Hadoop in Production SlideShare
MapR Technologies
 
Which data should you move to Hadoop?
Attunity
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Hadoop for Data Warehousing professionals
Edureka!
 
Data Warehouse Offload
John Berns
 
Hadoop & Data Warehouse
Mohit Srivastava
 
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 

How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million

  • 1. 1©MapR Technologies. All rights reserved. How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million Rob Rosen Sr. Director, Americas Systems Engineering MapR Technologies
  • 2. 2©MapR Technologies. All rights reserved. MapR Overview  Enterprise-grade platform for Hadoop  Deployed at thousands of companies – Including 12 of the Fortune 100  MapR is the preferred analytics platform – Hundreds of billions of events daily – 90% of the world’s Internet population monthly – $1 trillion in retail purchases annually
  • 3. 3©MapR Technologies. All rights reserved. Arrival of Big Data Impacts Data Warehouse Data Warehouse Volume Variety Velocity Prohibitively expensive storage costs Inability to process unstructured formats Faster arrival and processing needs
  • 4. 4©MapR Technologies. All rights reserved. Top Concern for Big Data Multiple data sources Multiple technologies Multiple copies of data “Too many different types, sources, and formats of critical data”
  • 5. 5©MapR Technologies. All rights reserved. The Hadoop Advantage  Fueling an industry revolution by providing infinite capability to store and process Big Data  Expanding analytics across data types  Compelling economics – 20 to 100X more cost effective than alternatives Pioneered at
  • 6. 6©MapR Technologies. All rights reserved. Important Drivers for Hadoop  Data on compute drives efficiencies and better analytics  With Hadoop you don’t need to know what questions to ask beforehand  Simple algorithms on Big Data outperform complex models  Powerful ability to analyze unstructured data
  • 7. 7©MapR Technologies. All rights reserved. Hadoop is the Technology of Choice for Big Data
  • 8. 8©MapR Technologies. All rights reserved. Source Data Social Media, Web Logs Machine Device, Scientific Documents and Emails Batch ETL Transactions, OLTP, OLAP Enterprise Data Warehouse Raw data or infrequently used data consuming capacity Batch windows hitting their limits putting SLAs at risk Databases and data warehouses are exceeding their capacity too quickly How Do You Lower and Control Data Warehouse Costs? Datamarts ODS Traditional Targets
  • 9. 9©MapR Technologies. All rights reserved. Source Data Traditional Targets Social Media, Web Logs Machine Device, Scientific Documents and Emails Transactions, OLTP, OLAP Enterprise Data Warehouse Lower Data Management Costs RDBMS MDM
  • 10. 10©MapR Technologies. All rights reserved. Bottom-Line Impact Sensor Data Web Logs Hadoop RDBMS Benefits:  Both structured and unstructured data  Expanded analytics with MapReduce, NoSQL, etc. DW Query + PresentETL + Long Term StorageETL + Long Term Storage Solution Cost / Terabyte Hadoop Advantage Hadoop $333 Teradata Warehouse Appliance $16,500 50x savings Oracle Exadata $14,000 42x savings IBM Netezza $10,000 30x savings
  • 11. 11©MapR Technologies. All rights reserved. What is the Best Way to Deploy Hadoop? vs. • Highly available and fully protected data • Works with existing tools • Real-time ingestion and extraction • Archive data from data warehouse Transitory Data Store • No long-term scale advantages • Unprotected data • ETL Tool focus Permanent Data Store Enterprise Data Hub
  • 12. 12©MapR Technologies. All rights reserved. An Enterprise Data Hub  Combine different data sources  Minimize data movement  One platform for analytics Sales SCM CRM Public Web Logs Production Data Sensor DataClick Streams Location Social Media Billing Enterprise Data Hub
  • 13. 13©MapR Technologies. All rights reserved. Key Elements of Enterprise Data Hub 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multi- tenancy Enterprise-grade platform for the long term • Reliability to support stringent SLAs • Protection from data loss and user or application errors • Support business continuity and meet recovery objectives
  • 14. 14©MapR Technologies. All rights reserved. High Availability and Dependability Reliable Compute Dependable Storage  Automated stateful failover  Automated re-replication  Self-healing from HW and SW failures  Load balancing  Rolling upgrades  No lost jobs or data  99999s of uptime • Business continuity with snapshots and mirrors • Recover to a point in time • End-to-end check summing • Strong consistency • Data safe • Mirror across sites to meet Recovery Time Objectives
  • 15. 15©MapR Technologies. All rights reserved. Enterprise Data Hub Supports a Range of Applications 99.999% HA Data Protection Disaster Recovery Scalability & Performance Enterprise Integration Multi- tenancy Batch Interactive Real-time Self-healing Instant recovery Snapshots for point in time recovery from user or application errors Unlimited files & tables Record setting performance Direct data ingestion and access Fully compliant ODBC access and SQL-92 support Mirroring across clusters and the WAN Secure access to multiple users and groups
  • 16. 16©MapR Technologies. All rights reserved. Business Impact  Saved millions in TCO  10x faster, 100x cheaper  Maintain the same SLAs  Implemented the change without impacting users Summary
  • 17. 17©MapR Technologies. All rights reserved. Q & A Engage with us! @mapr mapr- technologies maprtech MapR maprtech [email protected]

Editor's Notes

  • #3: MapR combines the best of the open source technology with our own deep innovations to provide the most advanced distribution for Apache Hadoop.MapR’s team has a deep bench of enterprise software experience with proven success across storage, networking, virtualization, analytics, and open source technologies.Our CEO has driven multiple companies to successful outcomes in the analytic, storage, and virtualization spaces.Our CTO and co-founder M.C. Srivas was most recently at Google in BigTable. He understands the challenges of MapReduce at huge scale. Srivas was also the chief software architect at Spinnaker Networks which came out of stealth with the fastest NAS storage on the market and was acquired quickly by NetAppThe team includes experience with enterprise storage at Cisco, VmWare, IBM and EMC. Our VP of Engineering was the senior vice president at Informatica where he built and managed a large R&D team of 250 that spanned four geographies with annual revenues of $300M. We also have experience in Business Intelligence and Analytic companies and open source committers in Hadoop, Zookeeper and Mahout including PMC members.MapR is proven technology with installs by leading Hadoop installations across industries and OEM by EMC and Cisco.
  • #5: Need a Platform that serves the broadest sets of use cases….
  • #6: Map Reduce is a paradigm shift. It’s moving the processing to the data.Apache Hadoop is a software framework that supports data-intensive distributed applications. Hadoop was inspired by a published Google MapReduce whitepaper. Apache Hadoop provides a new platform to analyze and process Big Data. With data growth exploding and new unstructured sources of data expanding a new approach is required to handle the volume, variety and velocity of this growing data. Hadoop clustering exploits commodity servers and increasingly less expensive compute, network and storage.Google is the Poster Child for the power of MapReduce. They were the 19th search engine to enter the market. There were 18 companies more successful and within 2 years, Google was the dominant player. That’s the power of the MapReduce framework.---------------------------Long versionA poster child for this is Google. We now take Google’s dominance for granted, but when Google launched their beta in 1998 they were late. They were at least the 19 search engines on the market. Yahoo was dominant, there was infoseek, excite, Lycos, Ask Jeeves, AltaVista (which had the technical cred). It wasn’t until Google published a paper in 2003 that we got a glimpse at their back end architecture. Google was able to reach dominance because they recognized early on the paradigm shift and they were able to index more data, get better results and do it much much more efficiently and cost effectively than their competitors. They went from 19th to first in a few short years because of MapReduce.A Yahoo engineer by the name of Doug Cutting read that same paper in 2003 and developed a Java implementation of MapReduce named after his son’s stuffed elephant that became the basis for the open source Hadoop project. Now when we say Hadoop we’re talking about a robust ecosystem. There are now multiple commercial versions of Hadoop. There’s a complete stack that includes job management, development tools, schedulers, machine learning libraries, etc. MapR’s co-founder and CTO was at Google he was in charge of the BigTable group and understands MapReduce at scale. Our charter was to fix the underlying flaws of the hadoop implementation to make it appropriate more a broader set of applications and work for most organizations.
  • #8: Let’s start with this chart. To reinforce you’re in the right room you picked the right session…Hadoop Not only is it the fastest growing Big Data technology…It is one of the fastest technologies period….Hadoop adoption is happening across industries and across a wide range of application areas.What’s driving this adoption
  • #9: Databases and data warehouses are growing & exceeding capacity too quicklyInactive data consuming storage and degrading performanceLow density & low priority data disproportionately consuming storage & processing capacityBatch windows hitting their limits putting SLAs at riskExtracts put too much load on source systems adding to expenseNot all data required is in the data warehouse
  • #15: With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.