SlideShare a Scribd company logo
Quick Housekeeping 
Q&A box is available for your questions 
Webinar will be recorded 
Thank You for joining! 
© Hortonworks Inc. 2014
Hadoop 2.0: YARN to Further 
Optimize Data Processing 
© Hortonworks Inc. 2014
Your Speakers 
John Kreisa, VP Strategic Marketing, Hortonworks 
Imad Birouty, Director, Technical Product 
Marketing, Teradata 
John Haddad, Senior Director, Product 
Marketing, Informatica 
© Hortonworks Inc. 2014
John Kreisa, VP Strategic Marketing, Hortonworks 
@marked_man 
© Hortonworks Inc. 2014
Big Data Market Trends and Predictions 
Big 
Data 
Explosion 
© Hortonworks Inc. 2014 
% by which org’s 
leveraging modern 
info management 
systems outperform 
peers by 2015 
85% 
from new 
data types 
ñ 
Hadoop 
enabled 
DBMS’s 
50x 
data growth 
2010 to 2020 
1 Zettabyte (ZB) 
= 
1 Billion TBs 
15x 
growth rate of 
machine 
generated data 
by 2020 
The US has 1/3 of the world’s data 
Big Data is 1 of 5 US GDP Game Changers 
$325 billion incremental annual GDP from big data analytics 
in retail and manufacturing by 2020
Existing systems under pressure 
Business 
Analy4cs 
RDBMS 
EDW 
NoSQL 
© Hortonworks Inc. 2014 
DATA 
SYSTEM 
APPLICATIONS 
REPOSITORIES 
SOURCES 
Exis4ng 
Sources 
Custom 
Applica4ons 
(CRM, 
ERP, 
Clickstream, 
Logs) 
Packaged 
Applica4ons 
2.8 
ZB 
in 
2012 
85% 
from 
New 
Data 
Types 
15x 
Machine 
Data 
by 
2020 
40 
ZB 
by 
2020 
Source: IDC 
OLTP, 
ERP, 
CRM 
Systems 
Unstructured 
documents, 
emails 
Server 
logs 
Clickstream 
Sen>ment, 
Web 
Data 
Sensor. 
Machine 
Data 
Geoloca>on
Hadoop with YARN Compliments Existing Architecture 
© Hortonworks Inc. 2014 
DEV 
& 
DATA 
TOOLS 
Build & 
Test 
OPERATIONS 
TOOLS 
Provision, 
Manage & 
Monitor 
DATA 
SYSTEM 
REPOSITORIES 
SOURCES 
RDBMS 
EDW 
NoSQL 
OLTP, 
ERP, 
CRM 
Systems 
Documents, 
Emails 
Web 
Logs, 
Click 
Streams 
Batch Interactive Real-Time 
Social 
Networks 
(Hadoop Distributed File System) 
Machine 
Generated 
HDFS 
Sensor 
Data 
Geoloca>on 
Data 
APPLICATIONS 
Business 
Analy4cs 
Custom 
Applica4ons 
Packaged 
Applica4ons 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° N
Hadoop: Typically used for new analytic apps 
SCALE SCOPE 
© Hortonworks Inc. 2014 
New Analytic Apps 
New types of data 
LOB-driven
Unlock Value in New Types of Data 
1. Social 
Understand how people are feeling and interacting – right now 
2. Clickstream 
Capture and analyze website visitors’ data trails and optimize your 
website 
3. Sensor/Machine 
Discover patterns in data streaming from remote sensors and 
machines 
4. Geographic 
Analyze location-based data to manage operations where they occur 
5. Server Logs 
Diagnose process failures and prevent security breaches 
6. Unstructured (txt, video, pictures, etc..) 
Understand patterns in files across millions of web pages, emails, and 
documents 
© Hortonworks Inc. 2014 
Value 
+ Online archive 
Data that was once purged or moved 
to tape can be stored in Hadoop to 
discover long term trends and 
previously hidden value
New Analytic Applications on Hadoop 
Industry Use Case Type of Data 
Financial Services 
© Hortonworks Inc. 2014 
New Account Risk Screens Text, Server Logs 
Trading Risk Server Logs 
Insurance Underwriting Geographic, Sensor, Text 
Telecom 
Call Detail Records (CDRs) Machine, Geographic 
Infrastructure Investment Machine, Server Logs 
Real-time Bandwidth Allocation Server Logs, Text, Social 
Retail 
360° View of the Customer Clickstream, Text 
Localized, Personalized Promotions Geographic 
Website Optimization Clickstream 
Manufacturing 
Supply Chain and Logistics Sensor 
Assembly Line Quality Assurance Sensor 
Crowdsourced Quality Assurance Social 
Healthcare 
Use Genomic Data in Medical Trials Structured 
Monitor Patient Vitals in Real-Time Sensor 
Pharmaceuticals 
Recruit and Retain Patients for Drug Trials Social, Clickstream 
Improve Prescription Adherence Social, Unstructured, Geographic 
Oil & Gas 
Unify Exploration & Production Data Sensor, Geographic & Unstructured 
Monitor Rig Safety in Real-Time Sensor, Unstructured 
Government 
ETL Offload in Response to Federal Budgetary Pressures Structured 
Sentiment Analysis for Government Programs Social
Hadoop: YARN Driven MDA Leads to a Data Lake 
SCALE SCOPE 
© Hortonworks Inc. 2014 
A Modern Data Architecture/Data Lake 
RDBMS 
MPP 
EDW 
New Analytic Apps 
New types of data 
LOB-driven 
Batch Interactive Real-Time 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
° ° ° ° ° ° ° ° ° N 
Data Lake 
An architectural shift in the data 
center that uses Hadoop to 
deliver deeper insight across a 
large, broad, diverse set of data 
at efficient scale
Integrating with Existing Investments 
© Hortonworks Inc. 2014 
DATA 
SYSTEM 
APPLICATIONS 
SOURCES 
RDBMS 
EDW 
MPP 
Batch Interactive Real-Time 
HDFS 
(Hadoop Distributed File System) 
Emerging 
Sources 
(Sensor, 
Sen4ment, 
Geo, 
Unstructured) 
BusinessObjects BI 
DEV 
& 
DATA 
TOOLS 
OPERATIONAL 
TOOLS 
Exis4ng 
Sources 
(CRM, 
ERP, 
Clickstream, 
Logs) 
INFRASTRUCTURE 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° N 
SOURCES 
OLTP, 
ERP, 
CRM 
Systems 
Documents, 
Emails 
Web 
Logs, 
Click 
Streams 
Social 
Networks 
Machine 
Generated 
Sensor 
Data 
Geoloca>on 
Data 
Viewpoint
Imad Birouty, Director, Technical 
Product Marketing, Teradata
Analysts Recommend: 
Shift from a Single Platform to an Ecosystem 
“We will abandon the old models 
based on the desire to implement 
for high-value analytic 
applications.” 
"Logical" Data Warehouse
Marketing 
Applications 
Business 
Intelligence 
Data 
Mining 
Math 
and Stats 
Languages 
ANALYTIC 
TOOLS & APPS 
Customers 
Partners 
Business 
Analysts 
Data 
Scientists 
USERS 
UNIFIED DATA ARCHITECTURE 
MOVE MANAGE ACCESS 
DATA WAREHOUSE 
DISCOVERY PLATFORM 
ERP 
SCM 
CRM 
Images 
Audio 
and Video 
Machine 
Logs 
Text 
Web and 
Social 
SOURCES 
DATA 
PLATFORM 
Marketing 
Executives 
Operational 
Systems 
Frontline 
Workers 
Engineers 
Fast Loading 
Filtering and 
Processing 
Online Archival 
Business Intelligence 
Predictive Analytics 
Operational Intelligence 
Data Discovery 
Path, graph, time-series analysis 
Pattern Detection
Marketing 
Applications 
Business 
Intelligence 
Data 
Mining 
Math 
and Stats 
Languages 
ANALYTIC 
TOOLS & APPS 
Customers 
Partners 
Business 
Analysts 
Data 
Scientists 
USERS 
UNIFIED DATA ARCHITECTURE 
MOVE MANAGE ACCESS 
DATA WAREHOUSE 
DISCOVERY PLATFORM 
ERP 
SCM 
CRM 
Images 
Audio 
and Video 
Machine 
Logs 
Text 
Web and 
Social 
SOURCES 
DATA 
PLATFORM 
Marketing 
Executives 
Operational 
Systems 
Frontline 
Workers 
Engineers
Data Lake Overview 
• The single source of raw, historical, and real-time 
operational data 
• The ability to cost effectively 
explore data sets of unknown, 
under-appreciated, or 
unrecognized value 
• The reduction of LOB specific 
big data environments, which 
reduces costs and analytical 
discrepancies 
• The co-location of data sets to 
enable light, on-the-fly integration
Approaches to Data Integration 
Schema on Write 
• Well understood data 
• Relational integrity 
• Storage efficiency 
Schema On Read 
• Dynamic data 
• Reduced coordination 
• Human readable 
Data Warehouse 
Data Lake
The “Capture Everything” Approach 
“Capture only 
what’s needed” 
IT delivers a platform for 
storing, refining, and 
analyzing all data 
sources 
Business explores data 
for questions worth 
answering 
Big Data Method 
Multi-structured & Iterative 
Analysis 
IT structures the data to 
answer those questions 
Business determines 
what questions to ask 
Classic Method 
Structured & Repeatable 
Analysis 
“Capture in case 
it’s needed”
Automobile Sensor Data 
Use Case 
Value from combining business data with detail data 
• Determine which cars to recall for bad battery lot 
> Business data held in data warehouse 
> Detailed sensor data held in data lake 
> Query combines data 
> Determine which cars to repair 
TERADATA 
PRODUCTION 
DATA 
• VINs 
• Service 
records 
• Warranty data 
• DTC 
descriptions 
HADOOP 
RAW MULTI-STRUCTURED 
DATA 
• Battery 
Temperature 
Sensor data 
Battery Temperature vs. Air Temperature
Customer Value Based on Social Influence 
Use Case 
HADOOP 
TERADATA 
ASTER 
DATABASE 
TERADATA 
DATABASE 
• Determine high value 
customers based on history 
• Determine customer value 
based on social influence 
• Determine 
customer 
sentiment 
<= 
• Determine 
customer 
sphere of 
influence 
$$
Data Optimization for the 
Modern Data Architecture 
John Haddad, Senior Director, Product 
Marketing, Informatica
The Big Data Journey 
The Big Data Journey 
Optimize infrastructure for 
performance, cost, & 
scalability 
A single place to 
manage the supply and 
demand of data 
Real-time proactive 
customer engagement 
Data Warehouse 
Optimization 
Real-Time 
Customer Analytics 
Managed Data 
Lake 
Big Data 
business 
initiatives 
IT driven Business driven
Proactive Customer Engagement 
Web Logs 
Clickstream Data 
Streaming 
Big Data Integration / Analytics 
Master 
Data 
Mgmt Financial Advisors 
Integration 
& Quality 
Customer / Product 
Master 
Customer 
Customer 
Smartphone 
Real-Time 
Event 
Processing 
Visualization 
Social Data / Signals 
Social Data 
Connector 
FIX, SWIFT, 
Market Data 
Customer Portal 
DATA 
PLATFORM 
DISCOVERY 
PLATFORM 
DATA 
WAREHOUSE
Proactive Patient Member Engagement 
Web Logs 
Clickstream Data 
Streaming 
Big Data Integration / Analytics 
Care Providers 
Integration 
& Quality 
Patient Member 
Patient Member 
Smartphone 
Real-Time 
Event 
Processing 
Visualization 
Social Data / Signals 
Social Data 
Connector 
RFID, Patient 
Monitoring 
Healthcare & 
Patient Forums 
Master 
Data 
Mgmt 
Member / Provider 
Master 
DATA 
PLATFORM 
DISCOVERY 
PLATFORM 
DATA 
WAREHOUSE
Unified Data 
Architecture 
DATA 
PLATFORM 
DISCOVERY 
PLATFORM 
DATA 
WAREHOUSE 
The Intelligent 
Data Platform 
Role-Based Data Management 
Tools 
Infrastructure Services 
Data Intelligence 
Metadata Meets Machine Learning 
Data Infrastructure 
Vibe ™ Virtual Data Machine 
New 
Industry- 
Leading 
Data Lake Infrastructure
Data Lake Architecture 
Informatica Developers are Now Hadoop Developers 
Visual Development Environment 
Enterprise 
Repositories 
MDM 
DATA REFINEMENT 
PPrroofifliele 
Parse 
ETL 
Cleanse 
Match 
LOAD 
SOURCE 
DATA 
Batch 
Replicate 
Stream 
Archive 
Databases 
Files 
Servers & 
Mainframe 
JMS Queue’s 
Social 
Sensor data 
SQL 
Apache 
Hive 
Apache 
MapReduce 
Apache 
Tez 
Apache 
YARN 
1 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
° 
N 
HDFS 
(Hadoop 
Distributed 
File 
System) 
DELIVER 
Batch 
Services 
Events 
Topics 
DATA 
WAREHOUSE
How do you plan to staff your Big Data projects? 
4 weeks 
4 days! 
2X performance! 
Vs. 
Hadoop 
Hand-coders 
Informatica developers 
Choose tools that leverages existing skills so you can quickly 
staff Big Data projects
How do you adopt and minimize the impact of new 
and rapidly changing technologies? 
Hadoop 
Development 
Deployment 
Cloud DI Servers Data 
Warehouse 
Choose a platform and tools that minimize the need to 
rebuild your data pipeline as technologies change
How long does it take you to deploy Big Data 
projects to production? 
Time to Deploy 
Available 
24x7 Scale 
Maximize 
Reuse 
Performance 
Automa4cally 
Deploy 
Easy 
to 
Maintain Flexible 
to 
Change 
Time to Deploy 
Everything you build in the sandbox should be immediately 
deployed as enterprise ready production
Next Steps 
© Hortonworks Inc. 2014 
Try the free Informatica Big Data Edition 
60-Day Trial 
http://marketplace.informatica.com/bdehortonworks 
Download the Hortonworks Sandbox 
http://hortonworks.com/products/hortonworks-sandbox/ 
Download Teradata Express 
Download Aster Express 
http://downloads.teradata.com/download/database 
http://downloads.teradata.com/download/aster/aster-express

More Related Content

What's hot (20)

PDF
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
PDF
The Next Generation of Big Data Analytics
Hortonworks
 
PDF
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
PDF
Actian forrester- hortonworks
Hortonworks
 
PDF
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
Hortonworks
 
PPTX
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PDF
Dataguise hortonworks insurance_feb25
Hortonworks
 
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
PDF
Hortonworks and Clarity Solution Group
Hortonworks
 
PDF
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 
PPTX
Hortonworks Oracle Big Data Integration
Hortonworks
 
PPTX
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
PDF
Talend Open Studio and Hortonworks Data Platform
Hortonworks
 
PPTX
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
PDF
Hortonworks sqrrl webinar v5.pptx
Hortonworks
 
PDF
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
PPTX
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
Revolution Analytics
 
PDF
Apache Hadoop on the Open Cloud
Hortonworks
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
The Next Generation of Big Data Analytics
Hortonworks
 
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
Actian forrester- hortonworks
Hortonworks
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
Hortonworks
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
Dataguise hortonworks insurance_feb25
Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
Hortonworks and Clarity Solution Group
Hortonworks
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
Talend Open Studio and Hortonworks Data Platform
Hortonworks
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
Revolution Analytics
 
Apache Hadoop on the Open Cloud
Hortonworks
 

Viewers also liked (20)

PPTX
MeasureCamp V London training: Data integration - Web Analytics, CRM, and Voi...
Sean Burton
 
PDF
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
DLT Solutions
 
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
PDF
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
Hortonworks
 
PDF
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
PPTX
Hadoop and Hive in Enterprises
markgrover
 
PDF
Hortonworks and Voltage Security webinar
Hortonworks
 
PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
PDF
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks
 
PDF
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
Hortonworks
 
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
PDF
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
PPTX
Schema-on-Read vs Schema-on-Write
Amr Awadallah
 
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
PPTX
Boost Performance with Scala – Learn From Those Who’ve Done It!
Hortonworks
 
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
Hortonworks
 
PDF
Hortonworks and HP Vertica Webinar
Hortonworks
 
MeasureCamp V London training: Data integration - Web Analytics, CRM, and Voi...
Sean Burton
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
DLT Solutions
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
3 CTOs Discuss the Shift to Next-Gen Analytic Ecosystems
Hortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Hadoop and Hive in Enterprises
markgrover
 
Hortonworks and Voltage Security webinar
Hortonworks
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks
 
Adoption de Hadoop : des Possibilités Illimitées - Hortonworks and Talend
Hortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
Schema-on-Read vs Schema-on-Write
Amr Awadallah
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Hortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks
 
Ad

Similar to Hadoop 2.0: YARN to Further Optimize Data Processing (20)

PDF
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
PDF
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Hortonworks
 
PDF
Capturing big value in big data
BSP Media Group
 
PDF
A Winning Strategy for the Digital Economy
Eric Kavanagh
 
PDF
Demystify big data data science
Mahesh Kumar CV
 
PDF
Hortonworks & Bilot Data Driven Transformations with Hadoop
Mats Johansson
 
PPTX
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
PDF
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
PDF
Track B-1 建構新世代的智慧數據平台
Etu Solution
 
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
PPTX
4th Industrial Revolution
Rolando Rangel
 
PDF
Fight Fraud with Big Data Analytics
Datameer
 
PDF
Take Action: The New Reality of Data-Driven Business
Inside Analysis
 
PDF
Big data Introduction by Mohan
Venkata Reddy Konasani
 
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
PDF
Tapdata Product Intro
Tapdata
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Hortonworks
 
Capturing big value in big data
BSP Media Group
 
A Winning Strategy for the Digital Economy
Eric Kavanagh
 
Demystify big data data science
Mahesh Kumar CV
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Mats Johansson
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Track B-1 建構新世代的智慧數據平台
Etu Solution
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
4th Industrial Revolution
Rolando Rangel
 
Fight Fraud with Big Data Analytics
Datameer
 
Take Action: The New Reality of Data-Driven Business
Inside Analysis
 
Big data Introduction by Mohan
Venkata Reddy Konasani
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Tapdata Product Intro
Tapdata
 
Ad

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
PDF
HDF 3.2 - What's New
Hortonworks
 
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
PDF
Premier Inside-Out: Apache Druid
Hortonworks
 
PDF
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
PDF
Making Enterprise Big Data Small with Ease
Hortonworks
 
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
PDF
Driving Digital Transformation Through Global Data Management
Hortonworks
 
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

Recently uploaded (20)

PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 

Hadoop 2.0: YARN to Further Optimize Data Processing

  • 1. Quick Housekeeping Q&A box is available for your questions Webinar will be recorded Thank You for joining! © Hortonworks Inc. 2014
  • 2. Hadoop 2.0: YARN to Further Optimize Data Processing © Hortonworks Inc. 2014
  • 3. Your Speakers John Kreisa, VP Strategic Marketing, Hortonworks Imad Birouty, Director, Technical Product Marketing, Teradata John Haddad, Senior Director, Product Marketing, Informatica © Hortonworks Inc. 2014
  • 4. John Kreisa, VP Strategic Marketing, Hortonworks @marked_man © Hortonworks Inc. 2014
  • 5. Big Data Market Trends and Predictions Big Data Explosion © Hortonworks Inc. 2014 % by which org’s leveraging modern info management systems outperform peers by 2015 85% from new data types ñ Hadoop enabled DBMS’s 50x data growth 2010 to 2020 1 Zettabyte (ZB) = 1 Billion TBs 15x growth rate of machine generated data by 2020 The US has 1/3 of the world’s data Big Data is 1 of 5 US GDP Game Changers $325 billion incremental annual GDP from big data analytics in retail and manufacturing by 2020
  • 6. Existing systems under pressure Business Analy4cs RDBMS EDW NoSQL © Hortonworks Inc. 2014 DATA SYSTEM APPLICATIONS REPOSITORIES SOURCES Exis4ng Sources Custom Applica4ons (CRM, ERP, Clickstream, Logs) Packaged Applica4ons 2.8 ZB in 2012 85% from New Data Types 15x Machine Data by 2020 40 ZB by 2020 Source: IDC OLTP, ERP, CRM Systems Unstructured documents, emails Server logs Clickstream Sen>ment, Web Data Sensor. Machine Data Geoloca>on
  • 7. Hadoop with YARN Compliments Existing Architecture © Hortonworks Inc. 2014 DEV & DATA TOOLS Build & Test OPERATIONS TOOLS Provision, Manage & Monitor DATA SYSTEM REPOSITORIES SOURCES RDBMS EDW NoSQL OLTP, ERP, CRM Systems Documents, Emails Web Logs, Click Streams Batch Interactive Real-Time Social Networks (Hadoop Distributed File System) Machine Generated HDFS Sensor Data Geoloca>on Data APPLICATIONS Business Analy4cs Custom Applica4ons Packaged Applica4ons YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N
  • 8. Hadoop: Typically used for new analytic apps SCALE SCOPE © Hortonworks Inc. 2014 New Analytic Apps New types of data LOB-driven
  • 9. Unlock Value in New Types of Data 1. Social Understand how people are feeling and interacting – right now 2. Clickstream Capture and analyze website visitors’ data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur 5. Server Logs Diagnose process failures and prevent security breaches 6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2014 Value + Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value
  • 10. New Analytic Applications on Hadoop Industry Use Case Type of Data Financial Services © Hortonworks Inc. 2014 New Account Risk Screens Text, Server Logs Trading Risk Server Logs Insurance Underwriting Geographic, Sensor, Text Telecom Call Detail Records (CDRs) Machine, Geographic Infrastructure Investment Machine, Server Logs Real-time Bandwidth Allocation Server Logs, Text, Social Retail 360° View of the Customer Clickstream, Text Localized, Personalized Promotions Geographic Website Optimization Clickstream Manufacturing Supply Chain and Logistics Sensor Assembly Line Quality Assurance Sensor Crowdsourced Quality Assurance Social Healthcare Use Genomic Data in Medical Trials Structured Monitor Patient Vitals in Real-Time Sensor Pharmaceuticals Recruit and Retain Patients for Drug Trials Social, Clickstream Improve Prescription Adherence Social, Unstructured, Geographic Oil & Gas Unify Exploration & Production Data Sensor, Geographic & Unstructured Monitor Rig Safety in Real-Time Sensor, Unstructured Government ETL Offload in Response to Federal Budgetary Pressures Structured Sentiment Analysis for Government Programs Social
  • 11. Hadoop: YARN Driven MDA Leads to a Data Lake SCALE SCOPE © Hortonworks Inc. 2014 A Modern Data Architecture/Data Lake RDBMS MPP EDW New Analytic Apps New types of data LOB-driven Batch Interactive Real-Time YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° ° ° ° ° N Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale
  • 12. Integrating with Existing Investments © Hortonworks Inc. 2014 DATA SYSTEM APPLICATIONS SOURCES RDBMS EDW MPP Batch Interactive Real-Time HDFS (Hadoop Distributed File System) Emerging Sources (Sensor, Sen4ment, Geo, Unstructured) BusinessObjects BI DEV & DATA TOOLS OPERATIONAL TOOLS Exis4ng Sources (CRM, ERP, Clickstream, Logs) INFRASTRUCTURE YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N SOURCES OLTP, ERP, CRM Systems Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geoloca>on Data Viewpoint
  • 13. Imad Birouty, Director, Technical Product Marketing, Teradata
  • 14. Analysts Recommend: Shift from a Single Platform to an Ecosystem “We will abandon the old models based on the desire to implement for high-value analytic applications.” "Logical" Data Warehouse
  • 15. Marketing Applications Business Intelligence Data Mining Math and Stats Languages ANALYTIC TOOLS & APPS Customers Partners Business Analysts Data Scientists USERS UNIFIED DATA ARCHITECTURE MOVE MANAGE ACCESS DATA WAREHOUSE DISCOVERY PLATFORM ERP SCM CRM Images Audio and Video Machine Logs Text Web and Social SOURCES DATA PLATFORM Marketing Executives Operational Systems Frontline Workers Engineers Fast Loading Filtering and Processing Online Archival Business Intelligence Predictive Analytics Operational Intelligence Data Discovery Path, graph, time-series analysis Pattern Detection
  • 16. Marketing Applications Business Intelligence Data Mining Math and Stats Languages ANALYTIC TOOLS & APPS Customers Partners Business Analysts Data Scientists USERS UNIFIED DATA ARCHITECTURE MOVE MANAGE ACCESS DATA WAREHOUSE DISCOVERY PLATFORM ERP SCM CRM Images Audio and Video Machine Logs Text Web and Social SOURCES DATA PLATFORM Marketing Executives Operational Systems Frontline Workers Engineers
  • 17. Data Lake Overview • The single source of raw, historical, and real-time operational data • The ability to cost effectively explore data sets of unknown, under-appreciated, or unrecognized value • The reduction of LOB specific big data environments, which reduces costs and analytical discrepancies • The co-location of data sets to enable light, on-the-fly integration
  • 18. Approaches to Data Integration Schema on Write • Well understood data • Relational integrity • Storage efficiency Schema On Read • Dynamic data • Reduced coordination • Human readable Data Warehouse Data Lake
  • 19. The “Capture Everything” Approach “Capture only what’s needed” IT delivers a platform for storing, refining, and analyzing all data sources Business explores data for questions worth answering Big Data Method Multi-structured & Iterative Analysis IT structures the data to answer those questions Business determines what questions to ask Classic Method Structured & Repeatable Analysis “Capture in case it’s needed”
  • 20. Automobile Sensor Data Use Case Value from combining business data with detail data • Determine which cars to recall for bad battery lot > Business data held in data warehouse > Detailed sensor data held in data lake > Query combines data > Determine which cars to repair TERADATA PRODUCTION DATA • VINs • Service records • Warranty data • DTC descriptions HADOOP RAW MULTI-STRUCTURED DATA • Battery Temperature Sensor data Battery Temperature vs. Air Temperature
  • 21. Customer Value Based on Social Influence Use Case HADOOP TERADATA ASTER DATABASE TERADATA DATABASE • Determine high value customers based on history • Determine customer value based on social influence • Determine customer sentiment <= • Determine customer sphere of influence $$
  • 22. Data Optimization for the Modern Data Architecture John Haddad, Senior Director, Product Marketing, Informatica
  • 23. The Big Data Journey The Big Data Journey Optimize infrastructure for performance, cost, & scalability A single place to manage the supply and demand of data Real-time proactive customer engagement Data Warehouse Optimization Real-Time Customer Analytics Managed Data Lake Big Data business initiatives IT driven Business driven
  • 24. Proactive Customer Engagement Web Logs Clickstream Data Streaming Big Data Integration / Analytics Master Data Mgmt Financial Advisors Integration & Quality Customer / Product Master Customer Customer Smartphone Real-Time Event Processing Visualization Social Data / Signals Social Data Connector FIX, SWIFT, Market Data Customer Portal DATA PLATFORM DISCOVERY PLATFORM DATA WAREHOUSE
  • 25. Proactive Patient Member Engagement Web Logs Clickstream Data Streaming Big Data Integration / Analytics Care Providers Integration & Quality Patient Member Patient Member Smartphone Real-Time Event Processing Visualization Social Data / Signals Social Data Connector RFID, Patient Monitoring Healthcare & Patient Forums Master Data Mgmt Member / Provider Master DATA PLATFORM DISCOVERY PLATFORM DATA WAREHOUSE
  • 26. Unified Data Architecture DATA PLATFORM DISCOVERY PLATFORM DATA WAREHOUSE The Intelligent Data Platform Role-Based Data Management Tools Infrastructure Services Data Intelligence Metadata Meets Machine Learning Data Infrastructure Vibe ™ Virtual Data Machine New Industry- Leading Data Lake Infrastructure
  • 27. Data Lake Architecture Informatica Developers are Now Hadoop Developers Visual Development Environment Enterprise Repositories MDM DATA REFINEMENT PPrroofifliele Parse ETL Cleanse Match LOAD SOURCE DATA Batch Replicate Stream Archive Databases Files Servers & Mainframe JMS Queue’s Social Sensor data SQL Apache Hive Apache MapReduce Apache Tez Apache YARN 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) DELIVER Batch Services Events Topics DATA WAREHOUSE
  • 28. How do you plan to staff your Big Data projects? 4 weeks 4 days! 2X performance! Vs. Hadoop Hand-coders Informatica developers Choose tools that leverages existing skills so you can quickly staff Big Data projects
  • 29. How do you adopt and minimize the impact of new and rapidly changing technologies? Hadoop Development Deployment Cloud DI Servers Data Warehouse Choose a platform and tools that minimize the need to rebuild your data pipeline as technologies change
  • 30. How long does it take you to deploy Big Data projects to production? Time to Deploy Available 24x7 Scale Maximize Reuse Performance Automa4cally Deploy Easy to Maintain Flexible to Change Time to Deploy Everything you build in the sandbox should be immediately deployed as enterprise ready production
  • 31. Next Steps © Hortonworks Inc. 2014 Try the free Informatica Big Data Edition 60-Day Trial http://marketplace.informatica.com/bdehortonworks Download the Hortonworks Sandbox http://hortonworks.com/products/hortonworks-sandbox/ Download Teradata Express Download Aster Express http://downloads.teradata.com/download/database http://downloads.teradata.com/download/aster/aster-express