SlideShare a Scribd company logo
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Industry Leaders Compete and Win with Data1TREND
More Data Beats Better Algorithms
Collecting interaction data from ecommerce, social media, offline, and call centers
enables a “customer 360 view” and consumer intimacy
Competitive Advantage is Decided by 0.5%
Consumer financial services: 1% improvement in fraud means hundreds of millions of dollars
Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability
© 2014 MapR Technologies 3
Fortune 100 Retailer
© 2014 MapR Technologies 4
Leading Cancer Research Center
© 2014 MapR Technologies 5
© 2014 MapR Technologies 6
Production Hadoop in Waste Management
© 2014 MapR Technologies 7
FINANCIAL
SERVICES RETAIL SECURITY INTERNET MEDIA
INFORMATION
TECHNOLOGY
ADVERTISING HEALTH TELCOM GOVERNMENT
Top 10 industries determined by customer bookings
Addressing Diverse Industries
© 2014 MapR Technologies 8
Difficult to Leverage Data with Traditional Systems
• Mission-critical reliability
• Transaction guarantees
• Deep security
• Real-time performance
• Backup and recovery
• Interactive SQL
• Rich analytics
• Workload management
• Data governance
• Backup and recovery
Enterprise
Data
Architecture
2TREND
ENTERPRISE
USERS
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
PRODUCTION
REQUIREMENTS
PRODUCTION
REQUIREMENTS
OUTSIDE SOURCES
© 2014 MapR Technologies 9
Hadoop: The Disruptive Technology at the Core of Big Data3TREND
JOB TRENDS FROM INDEED.COM
Jan „06 Jan „12 Jan „14Jan „07 Jan „08 Jan „09 Jan „10 Jan „11 Jan „13
© 2014 MapR Technologies 10
Hadoop: Distributed Compute on Data
© 2014 MapR Technologies 11
The Hadoop Advantage
BIG DATA
HADOOP
Data on
compute
Simple
algorithms on
Big Data
unstructured
data
© 2014 MapR Technologies 12
Economics: Hadoop Just Makes Sense
Data
IT Budgets
• Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“
• Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014
$9,000
$40,000
<$1,000
2013
ENTERPRISE
STORAGE
IT BUDGETS
GROWING AT 2.5%
2014 2015 2016 2017
DATABASE
WAREHOUSE
DATA GROWING
AT 40%
$ PER TERABYTE
IT budgets can’t keep up growing data
© 2014 MapR Technologies 13
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
1REALITY
• Data staging
• Archive
• Data transformation
• Data exploration
• Streaming,
interactions
Hadoop Relieves the Pressure on Enterprise Systems
2 Interoperability
1 Reliability and DR
4
Supports operations
and analytics
3 High performance
Keys for Production Success
© 2014 MapR Technologies 14
Architecture Matters for Success2REALITY
FOUNDATION
© 2014 MapR Technologies 15
FOUNDATION
Architecture Matters for Success2REALITY
Data protection
& security
High performance
Multi-tenancy
Workload
management
Open standards
for integration
NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO
© 2014 MapR Technologies 16
Hadoop is Being Used to Drive Small, Rapid Decisions3REALITY
High Arrival Rate Data
• Clickstream
• Social media
• Sensor data, …
Business Impact
• Revenue optimization
• Risk mitigation
• Operational efficiency
© 2014 MapR Technologies 17
Advertising
Automation
Cloud
Sellers
Cloud
Buyers
Cloud
100B
AD AUCTIONS
per day
© 2014 MapR Technologies 18
Largest Biometric Database in the World
PEOPLE
1.2B
PEOPLE
© 2014 MapR Technologies 19
50M
SET-TOP BOXES
© 2014 MapR Technologies 20
104M
CARD MEMBERS
Fortune 100 Financial Services Company
© 2014 MapR Technologies 21
World-Record Performance
PREVIOUS
RECORD: 1.6 TB
with 2200 nodes
1.65 TBIN 1 MINUTE
298 NODES
NEW MINUTESORT WORLD RECORD
MapR: With a Fraction of the Hardware
Previous Record
© 2014 MapR Technologies 22
Operations + Analytics
Fraud model
Recommendations
table
MapR Distribution for Hadoop
Fraud
investigator
Interactive
marketer
Online
transactions
Fraud
detection
Personalized
offers
Clickstream
analysis
Fraud
investigation tool
Real-time Operational Applications
Analytics
© 2014 MapR Technologies 23
Data Warehouse Optimization Using Hadoop
ADVANTAGES:
 Multi-million dollar cost savings
year over year
 Long term data offload with
HA, data protection and disaster
recovery
 Streaming writes to existing EDW
using NFS
 1T files
EDW
ETL and
Long Term Storage
Data
Warehouse
Data Warehouse:
Query and Report
Hadoop
Data Sources
Data Sources
© 2014 MapR Technologies 24
From Redundant Processing Silos and Data Science Experiments…
Opportunity to Revolutionize Enterprise Data Architecture
© 2014 MapR Technologies 25
… to Consolidated Operational and Analytical Workloads
The Production Enterprise Data Hub
© 2014 MapR Technologies 26
Q&A
@mapr maprtech
jnorris@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot (20)

PPTX
The Big Picture: Real-time Data is Defining Intelligent Offers
Cloudera, Inc.
 
PPTX
The Five Markers on Your Big Data Journey
Cloudera, Inc.
 
PDF
Accelerating Time to Success for Your Big Data Initiatives
☁Jake Weaver ☁
 
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
PDF
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Matt Stubbs
 
PPT
Introducing Gartner
chrisforte43
 
PDF
APAC Big Data Strategy_RK
IntelAPAC
 
PDF
Monetizing Big Data with Streaming Analytics for Telecoms Service Providers
Cubic Corporation
 
PPTX
The Vortex of Change - Digital Transformation (Presented by Intel)
Cloudera, Inc.
 
PPTX
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Precisely
 
PPT
Big datacamp june14_alex_liu
Data Con LA
 
PDF
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Renee Yao
 
PDF
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
 
PPTX
Meet the experts dwo bde vds v7
mmathipra
 
PPTX
Data Mashups for Analytics
Katharine Bierce
 
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
PDF
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
PDF
Reveal the Intelligence in your Data with Talend Data Fabric
Jean-Michel Franco
 
PDF
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
PPTX
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
The Big Picture: Real-time Data is Defining Intelligent Offers
Cloudera, Inc.
 
The Five Markers on Your Big Data Journey
Cloudera, Inc.
 
Accelerating Time to Success for Your Big Data Initiatives
☁Jake Weaver ☁
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Cloudera, Inc.
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Matt Stubbs
 
Introducing Gartner
chrisforte43
 
APAC Big Data Strategy_RK
IntelAPAC
 
Monetizing Big Data with Streaming Analytics for Telecoms Service Providers
Cubic Corporation
 
The Vortex of Change - Digital Transformation (Presented by Intel)
Cloudera, Inc.
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Precisely
 
Big datacamp june14_alex_liu
Data Con LA
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Renee Yao
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
 
Meet the experts dwo bde vds v7
mmathipra
 
Data Mashups for Analytics
Katharine Bierce
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
Reveal the Intelligence in your Data with Talend Data Fabric
Jean-Michel Franco
 
Fit For Purpose: Preventing a Big Data Letdown
Inside Analysis
 
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 

Similar to Hadoop: Revolutionizing Analytics AND Operations (20)

PPTX
Integrating Hadoop into your enterprise IT environment
MapR Technologies
 
PDF
Meruvian - Introduction to MapR
The World Bank
 
PDF
Key Considerations for Putting Hadoop in Production SlideShare
MapR Technologies
 
PPTX
Powering the "As it Happens" Business
MapR Technologies
 
PDF
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
PPTX
How Experian increased insights with Hadoop
Precisely
 
PDF
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
PPTX
Delivering on the Hadoop/HBase Integrated Architecture
DataWorks Summit
 
PPTX
Hadoop In The Real World
MapR Technologies
 
PDF
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
ervogler
 
PDF
Data Warehouse Evolution Roadshow
MapR Technologies
 
PPT
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
MapR Technologies
 
PDF
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
Datameer
 
PDF
Rajesh Angadi Brochure
Rajesh Angadi
 
PPTX
Which data should you move to Hadoop?
Attunity
 
PDF
Capturing big value in big data
BSP Media Group
 
PPTX
Introduction to Harnessing Big Data
Paul Barsch
 
PPTX
Finding business value in Big Data
James Serra
 
PPTX
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
WeAreEsynergy
 
PPTX
Expect More from Hadoop
MapR Technologies
 
Integrating Hadoop into your enterprise IT environment
MapR Technologies
 
Meruvian - Introduction to MapR
The World Bank
 
Key Considerations for Putting Hadoop in Production SlideShare
MapR Technologies
 
Powering the "As it Happens" Business
MapR Technologies
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
How Experian increased insights with Hadoop
Precisely
 
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Delivering on the Hadoop/HBase Integrated Architecture
DataWorks Summit
 
Hadoop In The Real World
MapR Technologies
 
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
ervogler
 
Data Warehouse Evolution Roadshow
MapR Technologies
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
MapR Technologies
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
Datameer
 
Rajesh Angadi Brochure
Rajesh Angadi
 
Which data should you move to Hadoop?
Attunity
 
Capturing big value in big data
BSP Media Group
 
Introduction to Harnessing Big Data
Paul Barsch
 
Finding business value in Big Data
James Serra
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
WeAreEsynergy
 
Expect More from Hadoop
MapR Technologies
 
Ad

More from MapR Technologies (20)

PPTX
Converging your data landscape
MapR Technologies
 
PPTX
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
 
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
PPTX
Enabling Real-Time Business with Change Data Capture
MapR Technologies
 
PPTX
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
 
PPTX
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
 
PPTX
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
 
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
PDF
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
 
PPTX
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
 
PDF
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
 
PDF
An Introduction to the MapR Converged Data Platform
MapR Technologies
 
PPTX
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
 
PPTX
Best Practices for Data Convergence in Healthcare
MapR Technologies
 
PPTX
Geo-Distributed Big Data and Analytics
MapR Technologies
 
PPTX
MapR Product Update - Spring 2017
MapR Technologies
 
PPTX
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
 
PPTX
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
 
PPTX
MapR and Cisco Make IT Better
MapR Technologies
 
PPTX
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 
Converging your data landscape
MapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
Enabling Real-Time Business with Change Data Capture
MapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
 
An Introduction to the MapR Converged Data Platform
MapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
 
Best Practices for Data Convergence in Healthcare
MapR Technologies
 
Geo-Distributed Big Data and Analytics
MapR Technologies
 
MapR Product Update - Spring 2017
MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
 
MapR and Cisco Make IT Better
MapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 
Ad

Recently uploaded (20)

PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Top Managed Service Providers in Los Angeles
Captain IT
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Français Patch Tuesday - Juillet
Ivanti
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 

Hadoop: Revolutionizing Analytics AND Operations

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Industry Leaders Compete and Win with Data1TREND More Data Beats Better Algorithms Collecting interaction data from ecommerce, social media, offline, and call centers enables a “customer 360 view” and consumer intimacy Competitive Advantage is Decided by 0.5% Consumer financial services: 1% improvement in fraud means hundreds of millions of dollars Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability
  • 3. © 2014 MapR Technologies 3 Fortune 100 Retailer
  • 4. © 2014 MapR Technologies 4 Leading Cancer Research Center
  • 5. © 2014 MapR Technologies 5
  • 6. © 2014 MapR Technologies 6 Production Hadoop in Waste Management
  • 7. © 2014 MapR Technologies 7 FINANCIAL SERVICES RETAIL SECURITY INTERNET MEDIA INFORMATION TECHNOLOGY ADVERTISING HEALTH TELCOM GOVERNMENT Top 10 industries determined by customer bookings Addressing Diverse Industries
  • 8. © 2014 MapR Technologies 8 Difficult to Leverage Data with Traditional Systems • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery Enterprise Data Architecture 2TREND ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  • 9. © 2014 MapR Technologies 9 Hadoop: The Disruptive Technology at the Core of Big Data3TREND JOB TRENDS FROM INDEED.COM Jan „06 Jan „12 Jan „14Jan „07 Jan „08 Jan „09 Jan „10 Jan „11 Jan „13
  • 10. © 2014 MapR Technologies 10 Hadoop: Distributed Compute on Data
  • 11. © 2014 MapR Technologies 11 The Hadoop Advantage BIG DATA HADOOP Data on compute Simple algorithms on Big Data unstructured data
  • 12. © 2014 MapR Technologies 12 Economics: Hadoop Just Makes Sense Data IT Budgets • Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“ • Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014 $9,000 $40,000 <$1,000 2013 ENTERPRISE STORAGE IT BUDGETS GROWING AT 2.5% 2014 2015 2016 2017 DATABASE WAREHOUSE DATA GROWING AT 40% $ PER TERABYTE IT budgets can’t keep up growing data
  • 13. © 2014 MapR Technologies 13 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS 1REALITY • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions Hadoop Relieves the Pressure on Enterprise Systems 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success
  • 14. © 2014 MapR Technologies 14 Architecture Matters for Success2REALITY FOUNDATION
  • 15. © 2014 MapR Technologies 15 FOUNDATION Architecture Matters for Success2REALITY Data protection & security High performance Multi-tenancy Workload management Open standards for integration NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO
  • 16. © 2014 MapR Technologies 16 Hadoop is Being Used to Drive Small, Rapid Decisions3REALITY High Arrival Rate Data • Clickstream • Social media • Sensor data, … Business Impact • Revenue optimization • Risk mitigation • Operational efficiency
  • 17. © 2014 MapR Technologies 17 Advertising Automation Cloud Sellers Cloud Buyers Cloud 100B AD AUCTIONS per day
  • 18. © 2014 MapR Technologies 18 Largest Biometric Database in the World PEOPLE 1.2B PEOPLE
  • 19. © 2014 MapR Technologies 19 50M SET-TOP BOXES
  • 20. © 2014 MapR Technologies 20 104M CARD MEMBERS Fortune 100 Financial Services Company
  • 21. © 2014 MapR Technologies 21 World-Record Performance PREVIOUS RECORD: 1.6 TB with 2200 nodes 1.65 TBIN 1 MINUTE 298 NODES NEW MINUTESORT WORLD RECORD MapR: With a Fraction of the Hardware Previous Record
  • 22. © 2014 MapR Technologies 22 Operations + Analytics Fraud model Recommendations table MapR Distribution for Hadoop Fraud investigator Interactive marketer Online transactions Fraud detection Personalized offers Clickstream analysis Fraud investigation tool Real-time Operational Applications Analytics
  • 23. © 2014 MapR Technologies 23 Data Warehouse Optimization Using Hadoop ADVANTAGES:  Multi-million dollar cost savings year over year  Long term data offload with HA, data protection and disaster recovery  Streaming writes to existing EDW using NFS  1T files EDW ETL and Long Term Storage Data Warehouse Data Warehouse: Query and Report Hadoop Data Sources Data Sources
  • 24. © 2014 MapR Technologies 24 From Redundant Processing Silos and Data Science Experiments… Opportunity to Revolutionize Enterprise Data Architecture
  • 25. © 2014 MapR Technologies 25 … to Consolidated Operational and Analytical Workloads The Production Enterprise Data Hub
  • 26. © 2014 MapR Technologies 26 Q&A @mapr maprtech [email protected] Engage with us! MapR maprtech mapr-technologies

Editor's Notes

  • #2: Hadoop: Revolutionizing Analytics AND Operations.Hadoop revolutionizes how data is stored processed and analyzed. Hadoop represents a new data and compute stack that provides huge operational advantages and is being used to change how organizations compete. This session will provide an overview of how customers are using Hadoop today through details on initial uses and a glimpse of how this new platform is providing organizations 10X performance at 1/10 the costOverview of Big Data Data driven companies Use cases….examples of data driven 2 to 3. Show importance of leveraging data… Existing systems getting overrun Examples of what this means ….Size of data, Oracle hitting the wall…Analytic speed…. Hadoop is at the center What is Hadoop Additional proof points???3 Realities Relieves the pressure Processing example in terms of how it scales Cost example… You don’t need to know the questions you’re going to ask ahead of time…. Small Rapid Decisions Examples of Operational Hadoop Rubicon 3 to 4 Follow up with the Use case… Architecture Matters Why is this the case The Results Where do you start…. Offloading examples…. Cisco – DW IRI – Mainframe offload
  • #3: The first trend is that the industry leaders have shown how to use big data to compete and win in their markets. It’s no longer a nice to have – you need big data to competeGoogle pioneered MapReduce processing on commodity hardware and used that to catapult themselves to into the leading search engine even though they were 19th in the marketYahoo! Leveraged these ideas to create Hadoop to keep up with Google and many mainstream companies have followed with new data-driven applications such as “people you may know” (started by LinkedIN and now used by Facebook, Twitter, and every social application), product recommendation engines, contextual and personalized music services (beats), measuring digital media effectiveness (comScore), serving more relevant/targeted ads(Comcast, rubicon project), fraud and risk detection, healthcare efficacy, and moreWhat makes the difference? A lot of attention is given to data science and developing sophisticated new algorithms, but in many cases just having more data beats better algorithms. (make point on collecting more consumer interaction as well as transaction data, as an example). In addition, competitive advantage is decided by very small percentages. Just 1% improvement in fraud can mean hundreds $millions in savings. A ½% lift in advertising effectiveness means millions in new product sales and profitability. The same can be applied to customer churn, disease diagnosis, and more.
  • #5: Doctors, particularly oncologists, are faced with an enormous amount of data regarding patient treatments, outcomes, and disease states. Hadoop is having an impact across the health care industry but for this minute we will focus on its use for developing better treatments. In one minute Hadoop can analyze more than 20,000 genes across hundreds of thousands of patients. The outcome of this analysis is to get a better understanding of genomic factors and integrate imaging and clinical analytics to better understand, predict, and impact survival. In any given minute our cluster is sequencing 422,000 genes per minute.
  • #6: Beats headphones by Dr. Dre have swept the audio market. Beats has launched a new Beats Music service thatis able to personalize music selections and select the perfect song in a minute from over 20 million songs. It joins a crowded space for online music, but now by using MapR Beats is able to provide a completely new personalized service from over 20 million songs in their library.It’s not about delivering 20Million songs, but providing a continuously-updating, personalized and tailored experience to users.
  • #9: A second trend in enterprise architecture has been big data overwhelming the existing workload-specific systems which are in production. (list of requirements for each of these on the side in text)People started with mainframes or operational systems which run ERP, finance, CRM and other mission-critical applications. They require… (pick out attributes you want to stress on the left)You also have data warehouses, marts, data mining, and other analytical systems which pull data from these operational and other systems for providing insights to the business for decision makingThe amount/variety of data has been overloading these systems. You reach a certain point as you try to ingest new types of data when these systems are not cost-effective to scale to terabytes or petabytes of data
  • #10: Hadoop has become the defacto big data platform which allows organizations to keep up with big data and feed data-driven applications and processesThis chart shows the percentage growth of jobs from Indeed.com.Compared to other popular technologies such as MongoDB and Cassandra, Hadoop is not only the fastest growing big data technology it’s one of the fastest growing technologies period. Hadoop has the most robust ecosystem and momentum and is the big data platform of choice for industry-leading, data-driven companies(Also of interest is that Indeed.com (which is a subsidiary of a Japanese-owned company) is a customer of MapR – they harness and analyze all of the job trends data using MapR)
  • #11: As implemented, MapReduce is actually a collection of complementary techniques and strategies that include employing commoditized hardware and software, specialized underlying file systems, and parallel processing methodologies. Many of the benefits arise from the fact that computation can be done on the same machines where data resides and from the fact that individual pieces of the overall computation can be recomputed if necessary due to hardware failure or other delays. This is a revolutionary architectural philosophy that shelters the average developer from the overwhelming complexity that had formerly been required to properly carry out parallel processing. But as we’ll see later, the implementation of MapReduce laid the foundation for significant problems now being experienced by many enterprises that are seeking to put it to work.
  • #12: Map Reduce is a paradigm shift. It’s moving the processing to the data.Apache Hadoop is a software framework that supports data-intensive distributed applications. Hadoop was inspired by a published Google MapReduce whitepaper. Apache Hadoop provides a new platform to analyze and process Big Data. With data growth exploding and new unstructured sources of data expanding a new approach is required to handle the volume, variety and velocity of this growing data. Hadoop clustering exploits commodity servers and increasingly less expensive compute, network and storage.Google is the Poster Child for the power of MapReduce. They were the 19th search engine to enter the market. There were 18 companies more successful and within 2 years, Google was the dominant player. That’s the power of the MapReduce framework.---------------------------Long versionA poster child for this is Google. We now take Google’s dominance for granted, but when Google launched their beta in 1998 they were late. They were at least the 19 search engines on the market. Yahoo was dominant, there was infoseek, excite, Lycos, Ask Jeeves, AltaVista (which had the technical cred). It wasn’t until Google published a paper in 2003 that we got a glimpse at their back end architecture. Google was able to reach dominance because they recognized early on the paradigm shift and they were able to index more data, get better results and do it much much more efficiently and cost effectively than their competitors. They went from 19th to first in a few short years because of MapReduce.A Yahoo engineer by the name of Doug Cutting read that same paper in 2003 and developed a Java implementation of MapReduce named after his son’s stuffed elephant that became the basis for the open source Hadoop project. Now when we say Hadoop we’re talking about a robust ecosystem. There are now multiple commercial versions of Hadoop. There’s a complete stack that includes job management, development tools, schedulers, machine learning libraries, etc. MapR’s co-founder and CTO was at Google he was in charge of the BigTable group and understands MapReduce at scale. Our charter was to fix the underlying flaws of the hadoop implementation to make it appropriate more a broader set of applications and work for most organizations.
  • #13: Need a Platform that serves the broadest sets of use cases….
  • #14: The first reality is that as people put Hadoop into production, to relieve the pressure from other systems in their enterprise architecture it needs to reliable . Hadoop needs to be held to the same enterprise standards as your Oracle, SAP, Teradata, NetApp storage, or any other enterprise system.Many organizations are putting Hadoop into their data center to provide (list of use cases underneath) … it can do all of this and more, butFor hadoop to act as a system of record , it must provide the same guarantees for SLA’s, performance, data protection, and moreMost importantly, Hadoop has the potential for both analytics AND operations. It can be used to optimize the data warehouse provide batch data refining or storage. But Hadoop can provide many operational analytics or database operations/jobs when done right.
  • #15: Choosing the right big data architecture is critical for success with your Hadoop projects and business applicationsOne analogy is building a sky scraper. Before you can start building up, you have to lay a rock-solid foundation. This building is the new Wilshire Grand project in Los Angeles. In Feb of this year they set a Guinness World Record for pouring a 21,000 cubic yard (16,000 cubic meters) foundation over 26 hours (http://www.theguardian.com/cities/2014/feb/14/world-largest-concrete-pour-la-trucks-los-angeles) When completed in 2017, the building will be the tallest in the US outside of NY and Chicago.
  • #16: This analogy applies as well to building a data platform – you have to architect for the future. This allows you to build higher, stronger, and faster, without retrofitting later down the road (anyone who has added a second story to their house can attest to the additional cost and construction delays if you have to reinforce a foundation which wasn’t designed to hold the stress)For business-critical applications you must have data protection and security (availability, data protection, and recovery), high performance (with random read-write system), multi-tenancy (to support multiple business units, isolate applications or user data,…), provide good resource and workload management to support multiple applications, and open standards to integrate with the rest of the enterprise data architectureThis data foundation allows you to support new data-driven applications (both operational and analytical) , maintain service level agreements with the business, provide information you can trust and count on being there when you need it, and ultimately being the best TCO for the long-run. Supporting enterprise systems without retrofits or multiple clusters to work around platform deficiencies (e.g., to support operational/online applications in Hadoop today, you need a separate HBase cluster – separate from the rest of your Hadoop cluster/investment)
  • #17: In a recent article by Tom Davenport (http://www.cmswire.com/cms/big-data/5-things-to-lessen-your-anxiety-about-big-data-024382.php) – he says“Big data’s biggest wins come from making many small decisions vs. one that’s huge. The majority of big data driven decisions will be recurring, made at speed (in milliseconds), and at scale; actions will be taken automatically (vs. reviewed and approved by an individual). Examples include ad platforms making many constant adjustments, fraud detection on millions of transactions that are based on individual patterns, fleet management and routing taking into account current conditions….This requires a Hadoop platform that can go beyond batch and support streaming writes so data can be constantly writing to the system while analysis is being conducted. High performance to meet the business needs and real-time operations the ability to perform online database operations to react to the business situation and impact business as it happens not report on it one week, month or quarter later.To do this requires THE RIGHT ARCHITECTURE
  • #18: One great example is the Rubicon Project, who recently filed their S1 to go public. They bet their business on data with Hadoop as the cornerstone of their business and developed pioneering technology that created a new model for the advertising industry – similar to what NASDAQ did for stock trading. Rubicon Project’s uses MapR for their automated advertising platform that processes over 100B ad auctions a day and provides the most extensive ad reach in the industry touching 96% of internet users in the US. They use MapR because of the superior system reliability, and performance and ability to run in their “lights out datacenters”. They switched from one of our competitors after experiencing a Namenode failure and constant up and down. This was fine in development, but Hadoop needed to be a production system in 2011, which is when they switched to MapR
  • #19: In India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with to create the largest biometric database in the world The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required MapR’sin-Hadoop database that is capable of 200 millisecond response times while supporting millions of concurrent look-ups.
  • #22: They ran the MinuteSort benchmark, a test which shows how much data you can sort in 1 minute. The Minutesort world record was set by Yahoo by sorting 1.6 terabytes with 2200 nodes. This MapR customer broke the record by sorting 1.65TB with 298 nodes. That’s 1/7th the hardware – that translates into tremendous cost, space, and management savings….
  • #23: Because only MapR can reliably run both operational and analytical applications on one platform/cluster, MapR enables a faster closed-loop process between operational applications and analytics. This means:interactive marketers and algorithms can update the rules engines more quickly and provide more real-time targeting of offers and relevant content to consumersFraud models are kept more up to date with the latest patterns to better detect anomalies and take action more quickly on bad actors
  • #25: MapR creates a new opportunity for enterprises. The Opportunity to revolutionize the enterprise data architectureFrom... ‘redundant processing silos’ and ‘data science experiments’. Where you need separate Hadoop clusters for streaming, HDFS/Hive, Hbase and more To… ‘
  • #26: To… ‘converged data &amp; processing hub’ that provides a TRUE PRODUCTIon enterprise data hub.This allows you to consolidate operational and analytical workloads. Not only across Hadoop use cases and applications, but for optimizing your enterprise data architecture