SlideShare a Scribd company logo
© Hortonworks Inc. 2013
Modern Data Architecture
…for Predictive Analytics
David Smith
VP Marketing and Community - Revolution Analytics
John Kreisa
VP Strategic Marketing- Hortonworks
Page 1
© Hortonworks Inc. 2013
Your Presenters
• David Smith (@revodavid)
–VP Marketing and Community at Revolution
Analytics
–Data Scientist, Blogger and co-author of An
Introduction to R
• John Kreisa (@marked_man)
–VP Strategic Marketing, Hortonworks
–Over 20 years in data management as a
developer and a marketer
–Avid camper
Page 2
© Hortonworks Inc. 2013
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop in the MDA
• R’s role in the MDA
• Q&A
Page 3
© Hortonworks Inc. 2013
Poll #1: What stage are you at looking in
Hadoop?
•Research
•Evaluation
•Trial
•Haven’t started research
Page 4
© Hortonworks Inc. 2013
Existing Data Architecture
Page 5
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
OPERATIONAL
TOOLS
MANAGE &
MONITOR
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Packaged
Applications
© Hortonworks Inc. 2013
Existing Data Architecture
Page 6
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Business
Analytics
Custom
Applications
Packaged
Applications
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
© Hortonworks Inc. 2013 - Confidential
Modern Data Architecture Enabled
Page 7
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources
(Sensor, Sentiment, Geo, Unstructured)
OPERATIONAL
TOOLS
MANAGE &
MONITOR
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Packaged
Applications
© Hortonworks Inc. 2013 - Confidential
Hadoop Powers Modern Data Architecture
Page 8
Apache Hadoop is an open source project
governed by the Apache Software Foundation
(ASF) that allows you to gain insight from massive
amounts of structured and unstructured data
quickly and without significant investment.
Hadoop Cluster
compute
&
storage
. . .
. . .
. .
compute
&
storage
.
.
Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware
© Hortonworks Inc. 2013 - Confidential
Driving Efficiency Driving Opportunity
Drivers for Hadoop Adoption
Modern Data Architecture
Hadoop has a central role in next
generation data architectures while
integrating with existing data systems
Business Applications
Use Hadoop to extract insights that
enable new customer value and
competitive edge
Existing
Traditional
Server log
Clickstream
Big Data Sets
Emerging
Sentiment/Social
Machine/Sensor
Geo-locations
© Hortonworks Inc. 2013 - Confidential
Opportunity in types of data
1. Sentiment
Understand how your customers feel about your brand and
products – right now
2. Clickstream
Capture and analyze website visitors’ data trails and
optimize your website
3. Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines
4. Geographic
Analyze location-based data to manage operations where
they occur
5. Server Logs
Research logs to diagnose process failures and prevent
security breaches
6. Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages,
emails, and documents
Value
Page 10
© Hortonworks Inc. 2013 - Confidential
Efficiency in the Modern Data Architecture
Page 11
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources
(Sensor, Sentiment, Geo, Unstructured)
Business
Analytics
Custom
Applications
Packaged
Applications
• Drive efficiency via
modern data
architecture
• Store data once and
access it in many
ways
• Often referred to a
data lake or data
repository
• Infrastructure platform
driven
• IT-oriented, TCO
based
© Hortonworks Inc. 2013 - Confidential
Engineered for Interoperability
Page 12
APPLICATIONSDATASYSTEMSOURCES
RDBMS EDW MPP
Emerging Sources
(Sensor, Sentiment, Geo, Unstructured)
HANA
BusinessObjects BI
OPERATIONAL TOOLS
DEV & DATA TOOLS
Existing Sources
(CRM, ERP, Clickstream, Logs)
INFRASTRUCTURE
© Hortonworks Inc. 2013 - Confidential
Integrated
Interoperable with
existing data center
investments Skills
Leverage your existing
skills: development,
operations, analytics
Requirements for Hadoop Adoption
Page 13
Key Services
Platform, operational and
data services essential for
the enterprise
Requirements for Hadoop’s Role
in the Modern Data Architecture
© Hortonworks Inc. 2013 - Confidential
Revolution R Enterprise Architecture
Page 14
APPLICATIONSDATASYSTEM
REPOSITORIES
SOURCES
Existing Sources
(CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources
(Sensor, Sentiment, Geo, Unstructured)
OPERATIONAL
TOOLS
MANAGE &
MONITOR
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Packaged
Applications
= Revolution R Enterprise
© Hortonworks Inc. 2013
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• R’s role in the MDA
• Q&A
Page 15
© Hortonworks Inc. 2013
Poll #2: Which of the following best describes
your use of R and Hadoop?
•We have R+ Hadoop in Production
•We have testing R+ Hadoop
•We have started to investigate but
nothing is implemented
•No current plans
Page 16
Revolution Confidential
What is the Open Source R Project?
 The R Language:
 Object-Oriented Language for Stats, Math and Data Science
 Comprehensive data visualization and statistical modeling
capabilities
 The R Community:
 2M+ Users with the Skill to Tackle Big Data Statistical and Numerical
Analysis and Machine Learning Projects
 New graduates with data skills learn R
 The R Ecosystem:
 5000+ Freely Available Algorithms in CRAN
 Specialized methods for finance, economics, genomics, linguistics,
and every data-driven domain
17
Revolution Confidential
R is open source and drives analytic innovation but has
some limitations for Enterprises
Bigger
data sizes
Speed of
analysis
Production
support
Memory Bound Big Data
Single Threaded
Scale out, parallel
processing, high speed
Community Support
Commercial
production support
Innovation
and scale
Innovative
5000+ packages
Exponential growth
Combines with open
source R packages
where needed
Revolution Confidential
Revolution R Enterprise
19
Enterprise-Ready
Revolution R Enterprise
is the only commercial big data analytics platform
based on open source R statistical computing language
Cross-Platform
Big Data Analytics
High Performance Analytics
Easier Build & Deploy
Modern Data Architecture
Extract and Analyze
 Ad-hoc Data Distillation
 Exploratory Data Analysis / Data Visualization
 Model Development
AMBARI
MAPREDUCE
YARN
HDFS
REST
DATA REFINEMENT
HIVEPIG CUSTOM
HTTP
STREAM
LOAD
SQOOP
FLUME
WebHDFS
NFS
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
LOAD
SQOOP/Hive
Web HDFS
Data Sources
CSV
DATABASES
INTERACTIVE
HIVE Server2
Analytical Tools
ANALYTICAL
rHadoop
Revolution Confidential
The Data Scientist’s Big Data Toolkit
21
Statistical
Tests
Machine
Learning
Simulation
Descriptive
Statistics
Data
Visualization
R Data Step
Predictive
Models
Sampling
Parallel External-Memory Algorithms
22
CPU
CPU
CPU
SMP SERVER
Parallel External-Memory Algorithms
23
HADOOP NODE
HADOOP NODE
HADOOP NODE
HADOOP CLUSTER
Revolution Confidential
Modern Data Architecture with RRE7
In-Hadoop Predictive Analytics
 Production Data Distillation (e.g. Semantic Analysis)
 Production Model Processing / Re-Estimation
 Production Model Scoring
AMBARI
MAPREDUCE
YARN
HDFS
REST
DATA REFINEMENT
HIVEPIG CUSTOM
DISTILLED DATA
FILES
HTTP
STREAM
LOAD
SQOOP
FLUME
WebHDFS
NFS
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
LOAD
SQOOP/Hive
Web HDFS
Data Sources
CSV
DATABASES
INTERACTIVE
HIVE Server2
Analytical Tools
ANALYTICAL
Revolution R
Enterprise
Revolution Confidential
Hadoop As An R Engine
 Use Revolution R Enterprise
PEMAs in Hadoop
 No need to change existing R code
 Simple R programming
 No need to “Think In MapReduce”
 Eliminate data movement to slash
latencies
 Use Hadoop nodes as parallel R
computation engines
25
Hadoop
© Hortonworks Inc. 2013
Integrated
Interoperable with
existing data center
investments Skills
Leverage your existing
skills: development,
operations, analytics
Requirements for Hadoop Adoption
Page 26
Key Services
Platform, operational and
data services essential for
the enterprise
Requirements for Hadoop’s Role
in the Modern Data Architecture
© Hortonworks Inc. 2013
Poll #3: Which of the following would you
most like to accomplish with R + Hadoop?
•Build a model to be put in product in
Hadoop
•Build a model to be put in product
elsewhere
•Create new data from Hadoop to
supplement an existing analytics process
•Something else
Page 27
© Hortonworks Inc. 2013
Next Steps:
Page 28
More about Revolution Analytics and Hadoop
http://www.revolutionanalytics.com/products/r-for-
hadoop.php
Get started on Hadoop with Hortonworks
Sandbox
http://hortonworks.com/sandbox
Follow us:
@hortonworks
@RevolutionR

More Related Content

What's hot (20)

PDF
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
 
PDF
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
StampedeCon
 
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 
PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
Hortonworks
 
PDF
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
PPTX
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
 
PPTX
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data: Setting Up the Big Data Lake
Caserta
 
PPTX
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
PDF
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
 
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
PPTX
Operational Analytics Using Spark and NoSQL Data Stores
DATAVERSITY
 
PDF
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
PDF
SplunkSummit 2015 - Real World Big Data Architecture
Splunk
 
PDF
Hadoop Trends
Hortonworks
 
PPTX
Big data architectures and the data lake
James Serra
 
PDF
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
PDF
Splunk Business Analytics
CleverDATA
 
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
SoftServe
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
StampedeCon
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
StampedeCon
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hortonworks
 
Incorporating the Data Lake into Your Analytic Architecture
Caserta
 
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
DataWorks Summit
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
Big Data: Setting Up the Big Data Lake
Caserta
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Operational Analytics Using Spark and NoSQL Data Stores
DATAVERSITY
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
 
SplunkSummit 2015 - Real World Big Data Architecture
Splunk
 
Hadoop Trends
Hortonworks
 
Big data architectures and the data lake
James Serra
 
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Splunk Business Analytics
CleverDATA
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 

Viewers also liked (6)

PDF
Pivotal Real Time Data Stream Analytics
kgshukla
 
PPTX
Big Data Platforms: An Overview
C. Scyphers
 
PPT
What Are The 5 W’s
Simon Jones
 
PPT
Usecase Presentation
Rungsun Promprasith
 
PPTX
5w 1h ppt
Monica Prilla Rgr
 
PPTX
Azure Data platform
Mostafa
 
Pivotal Real Time Data Stream Analytics
kgshukla
 
Big Data Platforms: An Overview
C. Scyphers
 
What Are The 5 W’s
Simon Jones
 
Usecase Presentation
Rungsun Promprasith
 
Azure Data platform
Mostafa
 
Ad

Similar to The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics (20)

PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
PDF
Modern data warehouse
Stephen Alex
 
PDF
Modern data warehouse
Stephen Alex
 
PDF
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
PDF
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
PDF
Hortonworks and HP Vertica Webinar
Hortonworks
 
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
PDF
Hortonworks kognitio webinar 10 dec 2013
Michael Hiskey
 
PDF
Modern Data Architecture: In-Memory with Hadoop - the new BI
Kognitio
 
PDF
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
 
PDF
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
PDF
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
PDF
R and Big Data using Revolution R Enterprise with Hadoop
Revolution Analytics
 
PDF
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
PPTX
Finding business value in Big Data
James Serra
 
PDF
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Revolution Analytics
 
PDF
Mighty Guides- Data Disruption
Mighty Guides, Inc.
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
Modern data warehouse
Stephen Alex
 
Modern data warehouse
Stephen Alex
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Hortonworks and HP Vertica Webinar
Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Hortonworks kognitio webinar 10 dec 2013
Michael Hiskey
 
Modern Data Architecture: In-Memory with Hadoop - the new BI
Kognitio
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
 
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
R and Big Data using Revolution R Enterprise with Hadoop
Revolution Analytics
 
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
Finding business value in Big Data
James Serra
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Revolution Analytics
 
Mighty Guides- Data Disruption
Mighty Guides, Inc.
 
Ad

More from Revolution Analytics (20)

PPTX
Speeding up R with Parallel Programming in the Cloud
Revolution Analytics
 
PPTX
Migrating Existing Open Source Machine Learning to Azure
Revolution Analytics
 
PPTX
R in Minecraft
Revolution Analytics
 
PPTX
The case for R for AI developers
Revolution Analytics
 
PPTX
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
PPTX
The R Ecosystem
Revolution Analytics
 
PPTX
R Then and Now
Revolution Analytics
 
PPTX
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
PPTX
Reproducible Data Science with R
Revolution Analytics
 
PPTX
The Value of Open Source Communities
Revolution Analytics
 
PPTX
The R Ecosystem
Revolution Analytics
 
PPTX
R at Microsoft (useR! 2016)
Revolution Analytics
 
PPTX
Building a scalable data science platform with R
Revolution Analytics
 
PPTX
R at Microsoft
Revolution Analytics
 
PPTX
The Business Economics and Opportunity of Open Source Data Science
Revolution Analytics
 
PPTX
Taking R Analytics to SQL and the Cloud
Revolution Analytics
 
PPTX
The Network structure of R packages on CRAN & BioConductor
Revolution Analytics
 
PPTX
The network structure of cran 2015 07-02 final
Revolution Analytics
 
PPTX
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
PPTX
R at Microsoft
Revolution Analytics
 
Speeding up R with Parallel Programming in the Cloud
Revolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Revolution Analytics
 
R in Minecraft
Revolution Analytics
 
The case for R for AI developers
Revolution Analytics
 
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
The R Ecosystem
Revolution Analytics
 
R Then and Now
Revolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
Reproducible Data Science with R
Revolution Analytics
 
The Value of Open Source Communities
Revolution Analytics
 
The R Ecosystem
Revolution Analytics
 
R at Microsoft (useR! 2016)
Revolution Analytics
 
Building a scalable data science platform with R
Revolution Analytics
 
R at Microsoft
Revolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
Revolution Analytics
 
Taking R Analytics to SQL and the Cloud
Revolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
Revolution Analytics
 
The network structure of cran 2015 07-02 final
Revolution Analytics
 
Simple Reproducibility with the checkpoint package
Revolution Analytics
 
R at Microsoft
Revolution Analytics
 

Recently uploaded (20)

PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Digital Circuits, important subject in CS
contactparinay1
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 

The Modern Data Architecture for Predictive Analytics with Hortonworks and Revolution Analytics

  • 1. © Hortonworks Inc. 2013 Modern Data Architecture …for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Page 1
  • 2. © Hortonworks Inc. 2013 Your Presenters • David Smith (@revodavid) –VP Marketing and Community at Revolution Analytics –Data Scientist, Blogger and co-author of An Introduction to R • John Kreisa (@marked_man) –VP Strategic Marketing, Hortonworks –Over 20 years in data management as a developer and a marketer –Avid camper Page 2
  • 3. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • R’s role in the MDA • Q&A Page 3
  • 4. © Hortonworks Inc. 2013 Poll #1: What stage are you at looking in Hadoop? •Research •Evaluation •Trial •Haven’t started research Page 4
  • 5. © Hortonworks Inc. 2013 Existing Data Architecture Page 5 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications
  • 6. © Hortonworks Inc. 2013 Existing Data Architecture Page 6 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Business Analytics Custom Applications Packaged Applications Source: IDC 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020
  • 7. © Hortonworks Inc. 2013 - Confidential Modern Data Architecture Enabled Page 7 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications
  • 8. © Hortonworks Inc. 2013 - Confidential Hadoop Powers Modern Data Architecture Page 8 Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment. Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  • 9. © Hortonworks Inc. 2013 - Confidential Driving Efficiency Driving Opportunity Drivers for Hadoop Adoption Modern Data Architecture Hadoop has a central role in next generation data architectures while integrating with existing data systems Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge Existing Traditional Server log Clickstream Big Data Sets Emerging Sentiment/Social Machine/Sensor Geo-locations
  • 10. © Hortonworks Inc. 2013 - Confidential Opportunity in types of data 1. Sentiment Understand how your customers feel about your brand and products – right now 2. Clickstream Capture and analyze website visitors’ data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents Value Page 10
  • 11. © Hortonworks Inc. 2013 - Confidential Efficiency in the Modern Data Architecture Page 11 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) Business Analytics Custom Applications Packaged Applications • Drive efficiency via modern data architecture • Store data once and access it in many ways • Often referred to a data lake or data repository • Infrastructure platform driven • IT-oriented, TCO based
  • 12. © Hortonworks Inc. 2013 - Confidential Engineered for Interoperability Page 12 APPLICATIONSDATASYSTEMSOURCES RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) HANA BusinessObjects BI OPERATIONAL TOOLS DEV & DATA TOOLS Existing Sources (CRM, ERP, Clickstream, Logs) INFRASTRUCTURE
  • 13. © Hortonworks Inc. 2013 - Confidential Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 13 Key Services Platform, operational and data services essential for the enterprise Requirements for Hadoop’s Role in the Modern Data Architecture
  • 14. © Hortonworks Inc. 2013 - Confidential Revolution R Enterprise Architecture Page 14 APPLICATIONSDATASYSTEM REPOSITORIES SOURCES Existing Sources (CRM, ERP, Clickstream, Logs) RDBMS EDW MPP Emerging Sources (Sensor, Sentiment, Geo, Unstructured) OPERATIONAL TOOLS MANAGE & MONITOR DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Packaged Applications = Revolution R Enterprise
  • 15. © Hortonworks Inc. 2013 Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • R’s role in the MDA • Q&A Page 15
  • 16. © Hortonworks Inc. 2013 Poll #2: Which of the following best describes your use of R and Hadoop? •We have R+ Hadoop in Production •We have testing R+ Hadoop •We have started to investigate but nothing is implemented •No current plans Page 16
  • 17. Revolution Confidential What is the Open Source R Project?  The R Language:  Object-Oriented Language for Stats, Math and Data Science  Comprehensive data visualization and statistical modeling capabilities  The R Community:  2M+ Users with the Skill to Tackle Big Data Statistical and Numerical Analysis and Machine Learning Projects  New graduates with data skills learn R  The R Ecosystem:  5000+ Freely Available Algorithms in CRAN  Specialized methods for finance, economics, genomics, linguistics, and every data-driven domain 17
  • 18. Revolution Confidential R is open source and drives analytic innovation but has some limitations for Enterprises Bigger data sizes Speed of analysis Production support Memory Bound Big Data Single Threaded Scale out, parallel processing, high speed Community Support Commercial production support Innovation and scale Innovative 5000+ packages Exponential growth Combines with open source R packages where needed
  • 19. Revolution Confidential Revolution R Enterprise 19 Enterprise-Ready Revolution R Enterprise is the only commercial big data analytics platform based on open source R statistical computing language Cross-Platform Big Data Analytics High Performance Analytics Easier Build & Deploy
  • 20. Modern Data Architecture Extract and Analyze  Ad-hoc Data Distillation  Exploratory Data Analysis / Data Visualization  Model Development AMBARI MAPREDUCE YARN HDFS REST DATA REFINEMENT HIVEPIG CUSTOM HTTP STREAM LOAD SQOOP FLUME WebHDFS NFS STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles LOAD SQOOP/Hive Web HDFS Data Sources CSV DATABASES INTERACTIVE HIVE Server2 Analytical Tools ANALYTICAL rHadoop
  • 21. Revolution Confidential The Data Scientist’s Big Data Toolkit 21 Statistical Tests Machine Learning Simulation Descriptive Statistics Data Visualization R Data Step Predictive Models Sampling
  • 23. Parallel External-Memory Algorithms 23 HADOOP NODE HADOOP NODE HADOOP NODE HADOOP CLUSTER
  • 24. Revolution Confidential Modern Data Architecture with RRE7 In-Hadoop Predictive Analytics  Production Data Distillation (e.g. Semantic Analysis)  Production Model Processing / Re-Estimation  Production Model Scoring AMBARI MAPREDUCE YARN HDFS REST DATA REFINEMENT HIVEPIG CUSTOM DISTILLED DATA FILES HTTP STREAM LOAD SQOOP FLUME WebHDFS NFS STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles LOAD SQOOP/Hive Web HDFS Data Sources CSV DATABASES INTERACTIVE HIVE Server2 Analytical Tools ANALYTICAL Revolution R Enterprise
  • 25. Revolution Confidential Hadoop As An R Engine  Use Revolution R Enterprise PEMAs in Hadoop  No need to change existing R code  Simple R programming  No need to “Think In MapReduce”  Eliminate data movement to slash latencies  Use Hadoop nodes as parallel R computation engines 25 Hadoop
  • 26. © Hortonworks Inc. 2013 Integrated Interoperable with existing data center investments Skills Leverage your existing skills: development, operations, analytics Requirements for Hadoop Adoption Page 26 Key Services Platform, operational and data services essential for the enterprise Requirements for Hadoop’s Role in the Modern Data Architecture
  • 27. © Hortonworks Inc. 2013 Poll #3: Which of the following would you most like to accomplish with R + Hadoop? •Build a model to be put in product in Hadoop •Build a model to be put in product elsewhere •Create new data from Hadoop to supplement an existing analytics process •Something else Page 27
  • 28. © Hortonworks Inc. 2013 Next Steps: Page 28 More about Revolution Analytics and Hadoop http://www.revolutionanalytics.com/products/r-for- hadoop.php Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/sandbox Follow us: @hortonworks @RevolutionR

Editor's Notes

  • #18: Remember that CRAN is a new term to IT professionals, and anyone who hasn’t learned much about R. Spend some time on it. The acronym stands for: Community R Archive Network – a single repository of R algorithms, test data, evaluations. Use by nearly all R programmers.