SlideShare a Scribd company logo
Eliminating the Challenges of Big Data
Management Inside Hadoop
2 © RedPoint Global Inc. 2015 Confidential
Today’s Speakers
Justin Sears, Senior Manager, Product Marketing,
Hortonworks
Jamie Keeffe, Product Marketing Manager, RedPoint
Global
Kris Tomes, Solutions Director, RedPoint Global
	
  
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks: Hadoop for the Enterprise
We Do Hadoop
Spring 2015
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop for the Enterprise:
Implement a Modern Data Architecture with HDP
Customer Momentum
•  330+ customers (as of year-end 2014)
Hortonworks Data Platform
•  Completely open multi-tenant platform for any app & any data.
•  A centralized architecture of consistent enterprise services for
resource management, security, operations, and governance.
Partner for Customer Success
•  Open source community leadership focus on enterprise needs
•  Unrivaled world class support
•  Founded in 2011
•  Original 24 architects, developers,
operators of Hadoop from Yahoo!
•  600+ Employees
•  1000+ Ecosystem Partners
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop for the Enterprise: Implement a
Modern Data Architecture with HDP
Spring 2015
Hortonworks. We do Hadoop.
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional systems under pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for
managing large volumes of high velocity and variety of data
•  Built by Yahoo! to be the heartbeat of its ad & search business
•  Donated to Apache Software Foundation in 2005 with rapid adoption by
large web properties & early adopter enterprises
•  Incredibly disruptive to current platform economics
Traditional Hadoop Advantages
ü  Manages new data paradigm
ü  Handles data at scale
ü  Cost effective
ü  Open source
Traditional Hadoop Had Limitations
" Batch-only architecture
" Single purpose clusters, specific data sets
" Difficult to integrate with existing investments
" Not enterprise-grade
Application
Storage
HDFS
Batch Processing
MapReduce
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture emerges to unify data & processing
Modern Data Architecture
•  Enable applications to have access to
all your enterprise data through an
efficient centralized platform
•  Supported with a centralized approach
governance, security and operations
•  Versatile to handle any applications
and datasets no matter the size or type
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
SOURCES
Existing Systems
ERP	
   CRM	
   SCM	
  
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch Batch
MPP	
   EDW	
  
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture emerges to unify data & processing
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
SOURCES
Existing Systems
ERP	
   CRM	
   SCM	
  
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch Batch
MPP	
   EDW	
  
RedPoint	
  Global	
  is	
  a	
  Hortonworks	
  Partner,	
  
cer3fied	
  on	
  HDP	
  and	
  YARN.	
  
	
  
With	
  RedPoint,	
  your	
  exis:ng	
  data	
  analysts	
  and	
  
database	
  administrators	
  can	
  easily	
  work	
  with	
  
data	
  stored	
  in	
  Hadoop.	
  No	
  new	
  skills	
  are	
  
required.	
  
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop adoption follows a predictable journey
Cost Optimization, new analytic apps, and ultimately to a “data lake”
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Cost optimization
Archive Data off EDW
Move rarely used data to Hadoop as active
archive, store more data longer
Offload costly ETL process
Free your EDW to perform high-value functions
like analytics & operations, not ETL
Enrich the value of your EDW
Use Hadoop to refine new data sources, such as
web and machine data for new analytical context
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
ANALYTICSDATASYSTEMS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP 2.2
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
Existing Systems
ERP	
   CRM	
   SCM	
  
SOURCES
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Single View
Improve acquisition and retention
Predictive Analytics
Identify your next best action
Data Discovery
Uncover new findings
Financial Services
New Account Risk Screens Trading Risk Insurance Underwriting
Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service
Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement
Telecom
Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse
Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis
Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers
Retail
360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase
Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs
Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior
Manufacturing
Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data
Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance
Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields
Healthcare
Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials
Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste
Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service
Oil & Gas
Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration
DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells
Government
Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness
Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting
Hadoop Driver: Advanced analytic applications
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the data lakeSCALE
SCOPE
Data Lake Definition
•  Centralized Architecture
Multiple applications on a shared data set
with consistent levels of service
•  Any App, Any Data
Multiple applications accessing all data
affording new insights and opportunities.
•  Unlocks ‘Systems of Insight’
Advanced algorithms and applications
used to derive new value and optimize
existing value.
Drivers:
1.  Cost Optimization
2.  Advanced Analytic Apps
Goal:
•  Centralized Architecture
•  Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Case Study: 12 month Hadoop evolution at TrueCar
DataPlatformCapabilities
12 months execution plan
June 2013
Begin
Hadoop
Execution
July 2013
Hortonworks
Partnership
May ‘14
IPO
Aug 2013
Training
& Dev
Begins
Nov 2013
Production
Cluster
60 Nodes
2 PB
Jan 2014
40% Dev
Staff
Perficient
Dec 2013
Three
Production
Apps
(3 total)
Feb 2014
Three More
Production
Apps
(6 total)
12 Month Results at TRUECar
•  Six Production Hadoop Applications
•  Sixty nodes/2PB data
•  Storage Costs/Compute Costs
from $19/GB to $0.23/GB
“We addressed our data platform capabilities
strategically as a pre-cursor to IPO.”
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks Data Platform
Hadoop for the Enterprise
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Only HDP delivers a Centralized Architecture
HDP is uniquely built around YARN serving as a data operating system that provides multi-tenant Resource
Management, consistent Governance & Security and efficient Operations services across Hadoop applications.
Hortonworks Data Platform
YARN
Data Operating System
•  A centralized architecture of
consistent enterprise services for
resource management, security,
operations, and governance.
•  The versatility to support multiple
applications and diverse workloads
from batch to interactive to real-time,
open source and commercial.
Key Benefits
•  Multiple applications on a shared data set
with consistent levels of service: a
multitenant data platform.
•  Provides a shared platform to enable new
analytic applications.
•  Delivers maximum cost efficiency for
cluster resource management. Fewer
servers fewer nodes.
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
Existing
Applications
New
Analytics
Partner
Applications
Data Access: Batch, Interactive & Real-time
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP delivers a completely open data platform
Hortonworks Data Platform 2.2
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture
of core enterprise services, for any application and any data.
Completely Open
•  HDP incorporates every element
required of an enterprise data
platform: data storage, data access,
governance, security, operations
•  All components are developed in
open source and then rigorously
tested, certified, and delivered as an
integrated open source platform
that’s easy to consume and use by
the enterprise and ecosystem.
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
ApachePig
° °
° °
° ° °
° ° °
HDFS
(Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
ApacheHive
Cascading
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache
Zookeeper
Apache Oozie
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP: Any Data, Any Application, Anywhere
Any Application
•  Deep integration with ecosystem
partners to extend existing
investments and skills
•  Broadest set of applications through
the stable of YARN-Ready applications
Any Data
Deploy applications fueled by clickstream, sensor,
social, mobile, geo-location, server log, and other new
paradigm datasets with existing legacy datasets.
Anywhere
Implement HDP naturally across the complete
range of deployment options
Clickstream	
   Web	
  	
  
&	
  Social	
  
Geoloca3on	
   Internet	
  of	
  
Things	
  
Server	
  	
  
Logs	
  
Files,	
  emails	
  ERP	
   CRM	
   SCM	
  
hybrid
commodity appliance cloud
Over 70 Hortonworks Certified YARN Apps
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Expansion
Architecture &
Development ProductionImplementation
Hortonworks supports the full application lifecycle
Hadoop usage follows a consistent lifecycle
From architecture to expansion, all with a consistent support experience
Most Common Support
Issues by Project Phase
Issues address by Hortonworks
Support by type for the past year
Issue Type
Architecture 7%
Application Development	
   10%
Installation	
   10%
Performance	
   5%
Configuration 	
   25%
Executing Jobs	
   20%
Cluster Administration 	
   18%
HDP Upgrades	
   3%
Enhancement Requests	
   3%
TOTAL 100%
Hortonworks Support
Full Lifecycle Subscription Support
Support through EVERY phase of adoption of
your Hadoop project to ensure your success
# tickets
Project 2
Project 3
Project N
.
.
.
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“Hortonworks loves and lives
open source innovation”
World Class Support and Services.
Hortonworks' Customer Support received a
maximum score and was significantly higher
than both Cloudera and MapR
A Leader in Hadoop
The Forrester Wave™
Big Data Hadoop Solutions
Q1 2014
Eliminating the Challenges of Big Data
Management Inside Hadoop
22 © RedPoint Global Inc. 2015 Confidential
Overview of RedPoint Global
" Launched	
  2006	
  
" Founded	
  and	
  staffed	
  by	
  industry	
  veterans	
  
" Headquarters:	
  Wellesley,	
  MassachuseJs	
  
" Offices	
  in	
  US,	
  UK,	
  Australia,	
  Philippines	
  
" Global	
  customer	
  base	
  
" Serves	
  most	
  major	
  industries	
  
	
  
MAGIC	
  QUADRANT	
  
Data	
  Quality	
  	
  
MAGIC	
  QUADRANT	
  
Mul:channel	
  Campaign	
  
Management	
  
MAGIC	
  QUADRANT	
  
Integrated	
  Marke:ng	
  
Management	
  
23 © RedPoint Global Inc. 2015 Confidential
Andrew Brust, GigaOm Research
24 © RedPoint Global Inc. 2015 Confidential
New Data Straining Current Architectures
Unstructured	
  documents,	
  emails	

Transac:onal	
  data	

Server	
  logs	

Sen:ment,	
  web	
  data	

Geoloca:on	

Sensor,	
  machine	
  data	

Clickstream	

Hierarchical	
  data	

OLTP,	
  ERP,	
  CRM	

Master	
  data	

2.8	
  ZB	
  in	
  2013	
  
85%	
  from	
  new	
  data	
  types	
  
15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  
Source: IDC
25 © RedPoint Global Inc. 2015 Confidential
Key Functions for Data Management
Master	
  Key	
  Management	
  
ETL	
  &	
  ELT	
   Data	
  Quality	
  
Web	
  Services	
  Integra:on	
  
Integra:on	
  &	
  Matching	
  
Process	
  Automa:on	
  	
  
&	
  Opera:ons	
  
•  Profiling,	
  reads/writes,	
  
transforma:ons	
  
•  Single	
  project	
  for	
  all	
  jobs	
  
•  Cleanse	
  data	
  
•  Parsing,	
  correc:on	
  
•  Geo-­‐spa:al	
  analysis	
  
•  Grouping	
  
•  Fuzzy	
  match	
  
•  Create	
  keys	
  
•  Track	
  changes	
  
•  Maintain	
  matches	
  	
  
over	
  :me	
  
•  Consume	
  and	
  publish	
  
•  HTTP/HTTPS	
  protocols	
  
•  XML/JSON/SOAP	
  formats	
  
•  Job	
  scheduling,	
  monitoring,	
  
no:fica:ons	
  
•  Central	
  point	
  of	
  control	
  
•  Meta	
  Data	
  Management	
  
26 © RedPoint Global Inc. 2015 Confidential
Overview - What is Hadoop?
Hadoop	
  1.0	
  
•  All	
  opera:ons	
  based	
  on	
  Map	
  Reduce	
  
•  Intrinsic	
  inconsistency	
  of	
  code	
  based	
  
solu:ons	
  
•  Highly	
  skilled	
  and	
  expensive	
  resources	
  
needed	
  
•  3rd	
  party	
  applica:ons	
  constrained	
  by	
  
the	
  need	
  to	
  generate	
  code	
  
Hadoop	
  2.0	
  
•  Introduc:on	
  of	
  the	
  YARN:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
“a	
  general-­‐purpose,	
  distributed,	
  applica:on	
  management	
  
framework	
  that	
  supersedes	
  the	
  classic	
  Apache	
  Hadoop	
  
MapReduce	
  framework	
  for	
  processing	
  data	
  in	
  Hadoop	
  
clusters.”	
  
•  Mature	
  applica:ons	
  can	
  now	
  operate	
  directly	
  on	
  
Hadoop	
  
•  Reduce	
  skill	
  requirements	
  and	
  increased	
  
consistency	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
HDFS	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
YARN:	
  	
  Data	
  Opera3ng	
  System	
  
Batch	
  
MapReduce	
  
Batch	
  &	
  Interac3ve	
  
Tez	
  
Real-­‐Time	
  
Slider	
  
Spark	
  
Other	
  ISV	
  
	
  
Other	
  	
  
ISV	
  
	
  
	
  
Stream	
  
	
  
	
  
Storm	
  
	
  
	
  
	
  
NoSQL	
  
	
  
	
  	
  
HBase	
  
Accumulo	
  
	
  
	
  
Other	
  	
  
ISV	
  
	
  
	
  
Cascading	
  
	
  
Scala	
  
Java	
  
	
  
	
  
	
  
SQL	
  
	
  
Hive	
  
	
  
	
  
	
  
	
  
Scrip3ng	
  
	
  
Pig	
  
	
  
	
  
	
  
	
  
Direct	
  
	
  
Java	
  
.NET	
  
	
  
	
  
	
  
API	
  
Engine	
  
System	
  
HADOOP	
  2.0	
  
27 © RedPoint Global Inc. 2015 Confidential
RedPoint Data Management on Hadoop
Par::oning	
  
AM	
  /	
  Tasks	
  
Execu:on	
  
AM	
  /	
  Tasks	
  
Data	
  I/O	
  
Key	
  /	
  Split	
  
Analysis	
  
Parallel	
  Sec:on	
  
YARN	
  
MapReduce	
  
28 © RedPoint Global Inc. 2015 Confidential
Resource	
  
Manager	
  
Launches	
  
Tasks	
  
Node	
  Manager	
  
DM	
  App	
  Master	
  
DM	
  Task	
  
Node	
  Manager	
  
DM	
  Task	
  
DM	
  Task	
  
Node	
  Manager	
  
DM	
  Task	
  
DM	
  Task	
  
Launches	
  DM	
  
App	
  Master	
  
Data	
  Management	
  
Designer	
  
DM	
  
Execu3on	
  
Server	
  
Parallel	
  Sec:on	
  
Running	
  DM	
  Task	
  
1
2
3
RedPoint DM for Hadoop: Processing Flow
29 © RedPoint Global Inc. 2015 Confidential
>150	
  Lines	
  of	
  MR	
  Code	
   ~50	
  Lines	
  of	
  Script	
  Code	
   0	
  Lines	
  of	
  Code	
  
6	
  hours	
  of	
  development	
   3	
  hours	
  of	
  development	
   15	
  min.	
  of	
  development	
  
6	
  minutes	
  run:me	
   15	
  minutes	
  run:me	
   3	
  minutes	
  run:me	
  
Extensive	
  op:miza:on	
  
needed	
  
User	
  Defined	
  Func:ons	
  
required	
  prior	
  to	
  running	
  
script	
  
No	
  tuning	
  or	
  op:miza:on	
  
required	
  
RedPoint	
  
Benchmarks – Project Gutenberg
Map	
  Reduce	
   Pig	
  
Sample	
  MapReduce	
  (small	
  subset	
  of	
  the	
  entire	
  code	
  which	
  totals	
  nearly	
  150	
  lines):	
  
public	
  static	
  class	
  MapClass
extends	
  Mapper<WordOffset, Text, Text, IntWritable> {	
  
private	
  final	
  static	
  String delimiters =
"',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿";	
  
private	
  final	
  static	
  IntWritable one = new	
  IntWritable(1);	
  
private	
  Text word = new	
  Text();	
  
public	
  void	
  map(WordOffset key, Text value, Context context)
throws	
  IOException, InterruptedException {
String line = value.toString();	
  
StringTokenizer itr = new	
  StringTokenizer(line, delimiters);	
  
while	
  (itr.hasMoreTokens()) {	
  
word.set(itr.nextToken());	
  
context.write(word, one);	
  
}	
  
}	
  
}	
  
	
  
Sample	
  Pig	
  script	
  without	
  the	
  UDF:	
  
SET	
  pig.maxCombinedSplitSize 67108864	
  
SET	
  pig.splitCombination true	
  
A = LOAD	
  '/testdata/pg/*/*/*';	
  
B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS	
  word;	
  
C = FOREACH B GENERATE UPPER(word) AS	
  word;	
  
D = GROUP	
  C BY	
  word;	
  
E = FOREACH D GENERATE COUNT(C) AS	
  occurrences, group; 	
  
F = ORDER	
  E BY	
  occurrences DESC;	
  
STORE F INTO	
  '/user/cleonardi/pg/pig-count';
30 © RedPoint Global Inc. 2015 Confidential
RedPoint Ranks #1
31 © RedPoint Global Inc. 2015 Confidential
Consistent High Rankings
32 © RedPoint Global Inc. 2015 Confidential
Data Lake Architecture for MDM
33 © RedPoint Global Inc. 2015 Confidential
Kris Tomes
Solution Director at RedPoint
Demonstration Introduction
34 © RedPoint Global Inc. 2015 Confidential
Key Factors to Consider
" Traditional data architectures are challenged
" Maximize the scale & cost optimization of the Hortonworks Modern Data Architecture
" Leverage your DBAs to control development / implementation / production costs and schedules
" Smooth out your journey to a data lake
" Expedite the speed for getting business applications into production
" Insist on Any Data, Any Application, Any Environment
" Do your data quality and data integration in the cluster
35 © RedPoint Global Inc. 2015 Confidential
Thank You & Please Visit Us at www.RedPoint.net
Jamie	
  Keeffe	
  
Product	
  Marke:ng	
  Manager	
  	
  
	
  
RedPoint	
  Global	
  Inc.	
  
Jamie.Keeffe@RedPoint.Net	
  	
  
+1	
  978-­‐764-­‐3839	
  

More Related Content

PDF
Actian forrester- hortonworks
Hortonworks
 
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
PDF
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
PDF
Hadoop 2.0: YARN to Further Optimize Data Processing
Hortonworks
 
PDF
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
PDF
The Next Generation of Big Data Analytics
Hortonworks
 
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Actian forrester- hortonworks
Hortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 
IDC Retail Insights - What's Possible with a Modern Data Architecture?
Hortonworks
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
The Next Generation of Big Data Analytics
Hortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 

What's hot (19)

PPTX
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 
PPTX
How Universities Use Big Data to Transform Education
Hortonworks
 
PPTX
Hortonworks Oracle Big Data Integration
Hortonworks
 
PDF
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
Hortonworks
 
PDF
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Hortonworks
 
PDF
Hortonworks sqrrl webinar v5.pptx
Hortonworks
 
PDF
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Hortonworks
 
PPTX
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Kelly Kohlleffel
 
PDF
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
PDF
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
PPTX
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
PPTX
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PDF
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
PPTX
Yahoo! Hack Europe
Hortonworks
 
PPTX
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
PPTX
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
BigDataExpo
 
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 
How Universities Use Big Data to Transform Education
Hortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
Hortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Hortonworks
 
Trucking demo w Spark ML - Paul Hargis - Hortonworks
Kelly Kohlleffel
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Hortonworks
 
Yahoo! Hack Europe
Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks
 
Big Data Expo 2015 - Hortonworks Common Hadoop Use Cases
BigDataExpo
 
Ad

Viewers also liked (9)

PPTX
Hadoop and Spark – Perfect Together
Hortonworks
 
PPTX
Apache Hive on ACID
Hortonworks
 
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
PDF
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
PDF
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
PDF
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
PDF
Hortonworks technical workshop operations with ambari
Hortonworks
 
Hadoop and Spark – Perfect Together
Hortonworks
 
Apache Hive on ACID
Hortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Hortonworks
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks
 
Ad

Similar to Eliminating the Challenges of Big Data Management Inside Hadoop (20)

PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
PDF
Hortonworks and HP Vertica Webinar
Hortonworks
 
PDF
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Hortonworks & Bilot Data Driven Transformations with Hadoop
Mats Johansson
 
PDF
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
Platfora
 
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
PDF
Introduction to Hadoop
POSSCON
 
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
PDF
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
PDF
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
PDF
Bridging the Big Data Gap in the Software-Driven World
CA Technologies
 
PDF
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
PDF
Meetup oslo hortonworks HDP
Alexander Bakos Leirvåg
 
PDF
Hortonworks Hadoop @ Oslo Hadoop User Group
Mats Johansson
 
PPTX
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
PPTX
4. Big data & analytics HP
MITEF México
 
PDF
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
PDF
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Hortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks
 
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Mats Johansson
 
The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...
Platfora
 
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
 
Introduction to Hadoop
POSSCON
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
Bridging the Big Data Gap in the Software-Driven World
CA Technologies
 
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Meetup oslo hortonworks HDP
Alexander Bakos Leirvåg
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Mats Johansson
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
MapR Technologies
 
4. Big data & analytics HP
MITEF México
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
Hadoop and Your Enterprise Data Warehouse
Edgar Alejandro Villegas
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
PDF
HDF 3.2 - What's New
Hortonworks
 
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
PDF
Premier Inside-Out: Apache Druid
Hortonworks
 
PDF
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
PDF
Making Enterprise Big Data Small with Ease
Hortonworks
 
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
PDF
Driving Digital Transformation Through Global Data Management
Hortonworks
 
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

Recently uploaded (20)

PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
Presentation about variables and constant.pptx
kr2589474
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Presentation about variables and constant.pptx
safalsingh810
 

Eliminating the Challenges of Big Data Management Inside Hadoop

  • 1. Eliminating the Challenges of Big Data Management Inside Hadoop
  • 2. 2 © RedPoint Global Inc. 2015 Confidential Today’s Speakers Justin Sears, Senior Manager, Product Marketing, Hortonworks Jamie Keeffe, Product Marketing Manager, RedPoint Global Kris Tomes, Solutions Director, RedPoint Global  
  • 3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks: Hadoop for the Enterprise We Do Hadoop Spring 2015
  • 4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP Customer Momentum •  330+ customers (as of year-end 2014) Hortonworks Data Platform •  Completely open multi-tenant platform for any app & any data. •  A centralized architecture of consistent enterprise services for resource management, security, operations, and governance. Partner for Customer Success •  Open source community leadership focus on enterprise needs •  Unrivaled world class support •  Founded in 2011 •  Original 24 architects, developers, operators of Hadoop from Yahoo! •  600+ Employees •  1000+ Ecosystem Partners
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP Spring 2015 Hortonworks. We do Hadoop.
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Traditional systems under pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 40 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop emerged as foundation of new data architecture Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data •  Built by Yahoo! to be the heartbeat of its ad & search business •  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises •  Incredibly disruptive to current platform economics Traditional Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source Traditional Hadoop Had Limitations " Batch-only architecture " Single purpose clusters, specific data sets " Difficult to integrate with existing investments " Not enterprise-grade Application Storage HDFS Batch Processing MapReduce
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Modern Data Architecture emerges to unify data & processing Modern Data Architecture •  Enable applications to have access to all your enterprise data through an efficient centralized platform •  Supported with a centralized approach governance, security and operations •  Versatile to handle any applications and datasets no matter the size or type Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES Existing Systems ERP   CRM   SCM   ANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICS Applications Business Analytics Visualization & Dashboards ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN: Data Operating System Interactive Real-TimeBatch Partner ISVBatch Batch MPP   EDW  
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Modern Data Architecture emerges to unify data & processing Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES Existing Systems ERP   CRM   SCM   ANALYTICS Data Marts Business Analytics Visualization & Dashboards ANALYTICS Applications Business Analytics Visualization & Dashboards ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN: Data Operating System Interactive Real-TimeBatch Partner ISVBatch Batch MPP   EDW   RedPoint  Global  is  a  Hortonworks  Partner,   cer3fied  on  HDP  and  YARN.     With  RedPoint,  your  exis:ng  data  analysts  and   database  administrators  can  easily  work  with   data  stored  in  Hadoop.  No  new  skills  are   required.  
  • 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop adoption follows a predictable journey Cost Optimization, new analytic apps, and ultimately to a “data lake”
  • 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Cost optimization Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context ANALYTICS Data Marts Business Analytics Visualization & Dashboards HDP helps you reduce costs and optimize the value associated with your EDW ANALYTICSDATASYSTEMS Data Marts Business Analytics Visualization & Dashboards HDP 2.2 ELT ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Cold Data, Deeper Archive & New Sources Enterprise Data Warehouse Hot MPP In-Memory Clickstream   Web     &  Social   Geoloca3on   Sensor     &  Machine   Server     Logs   Unstructured   Existing Systems ERP   CRM   SCM   SOURCES
  • 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Single View Improve acquisition and retention Predictive Analytics Identify your next best action Data Discovery Uncover new findings Financial Services New Account Risk Screens Trading Risk Insurance Underwriting Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement Telecom Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers Retail 360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior Manufacturing Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields Healthcare Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service Oil & Gas Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells Government Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting Hadoop Driver: Advanced analytic applications
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Driver: Enabling the data lakeSCALE SCOPE Data Lake Definition •  Centralized Architecture Multiple applications on a shared data set with consistent levels of service •  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities. •  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value. Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps Goal: •  Centralized Architecture •  Data-driven Business DATA LAKE Journey to the Data Lake with Hadoop Systems of Insight
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Case Study: 12 month Hadoop evolution at TrueCar DataPlatformCapabilities 12 months execution plan June 2013 Begin Hadoop Execution July 2013 Hortonworks Partnership May ‘14 IPO Aug 2013 Training & Dev Begins Nov 2013 Production Cluster 60 Nodes 2 PB Jan 2014 40% Dev Staff Perficient Dec 2013 Three Production Apps (3 total) Feb 2014 Three More Production Apps (6 total) 12 Month Results at TRUECar •  Six Production Hadoop Applications •  Sixty nodes/2PB data •  Storage Costs/Compute Costs from $19/GB to $0.23/GB “We addressed our data platform capabilities strategically as a pre-cursor to IPO.”
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks Data Platform Hadoop for the Enterprise
  • 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Only HDP delivers a Centralized Architecture HDP is uniquely built around YARN serving as a data operating system that provides multi-tenant Resource Management, consistent Governance & Security and efficient Operations services across Hadoop applications. Hortonworks Data Platform YARN Data Operating System •  A centralized architecture of consistent enterprise services for resource management, security, operations, and governance. •  The versatility to support multiple applications and diverse workloads from batch to interactive to real-time, open source and commercial. Key Benefits •  Multiple applications on a shared data set with consistent levels of service: a multitenant data platform. •  Provides a shared platform to enable new analytic applications. •  Delivers maximum cost efficiency for cluster resource management. Fewer servers fewer nodes. Storage YARN: Data Operating System Governance Security Operations Resource Management Existing Applications New Analytics Partner Applications Data Access: Batch, Interactive & Real-time
  • 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP delivers a completely open data platform Hortonworks Data Platform 2.2 Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data. Completely Open •  HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations •  All components are developed in open source and then rigorously tested, certified, and delivered as an integrated open source platform that’s easy to consume and use by the enterprise and ecosystem. YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS Apache Falcon ApacheHive Cascading ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm Apache Sqoop Apache Flume Apache Kafka SECURITY Apache Ranger Apache Knox Apache Falcon OPERATIONS Apache Ambari Apache Zookeeper Apache Oozie
  • 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP: Any Data, Any Application, Anywhere Any Application •  Deep integration with ecosystem partners to extend existing investments and skills •  Broadest set of applications through the stable of YARN-Ready applications Any Data Deploy applications fueled by clickstream, sensor, social, mobile, geo-location, server log, and other new paradigm datasets with existing legacy datasets. Anywhere Implement HDP naturally across the complete range of deployment options Clickstream   Web     &  Social   Geoloca3on   Internet  of   Things   Server     Logs   Files,  emails  ERP   CRM   SCM   hybrid commodity appliance cloud Over 70 Hortonworks Certified YARN Apps
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Expansion Architecture & Development ProductionImplementation Hortonworks supports the full application lifecycle Hadoop usage follows a consistent lifecycle From architecture to expansion, all with a consistent support experience Most Common Support Issues by Project Phase Issues address by Hortonworks Support by type for the past year Issue Type Architecture 7% Application Development   10% Installation   10% Performance   5% Configuration   25% Executing Jobs   20% Cluster Administration   18% HDP Upgrades   3% Enhancement Requests   3% TOTAL 100% Hortonworks Support Full Lifecycle Subscription Support Support through EVERY phase of adoption of your Hadoop project to ensure your success # tickets Project 2 Project 3 Project N . . .
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved “Hortonworks loves and lives open source innovation” World Class Support and Services. Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapR A Leader in Hadoop The Forrester Wave™ Big Data Hadoop Solutions Q1 2014
  • 21. Eliminating the Challenges of Big Data Management Inside Hadoop
  • 22. 22 © RedPoint Global Inc. 2015 Confidential Overview of RedPoint Global " Launched  2006   " Founded  and  staffed  by  industry  veterans   " Headquarters:  Wellesley,  MassachuseJs   " Offices  in  US,  UK,  Australia,  Philippines   " Global  customer  base   " Serves  most  major  industries     MAGIC  QUADRANT   Data  Quality     MAGIC  QUADRANT   Mul:channel  Campaign   Management   MAGIC  QUADRANT   Integrated  Marke:ng   Management  
  • 23. 23 © RedPoint Global Inc. 2015 Confidential Andrew Brust, GigaOm Research
  • 24. 24 © RedPoint Global Inc. 2015 Confidential New Data Straining Current Architectures Unstructured  documents,  emails Transac:onal  data Server  logs Sen:ment,  web  data Geoloca:on Sensor,  machine  data Clickstream Hierarchical  data OLTP,  ERP,  CRM Master  data 2.8  ZB  in  2013   85%  from  new  data  types   15x  Machine  Data  by  2020   40  ZB  by  2020   Source: IDC
  • 25. 25 © RedPoint Global Inc. 2015 Confidential Key Functions for Data Management Master  Key  Management   ETL  &  ELT   Data  Quality   Web  Services  Integra:on   Integra:on  &  Matching   Process  Automa:on     &  Opera:ons   •  Profiling,  reads/writes,   transforma:ons   •  Single  project  for  all  jobs   •  Cleanse  data   •  Parsing,  correc:on   •  Geo-­‐spa:al  analysis   •  Grouping   •  Fuzzy  match   •  Create  keys   •  Track  changes   •  Maintain  matches     over  :me   •  Consume  and  publish   •  HTTP/HTTPS  protocols   •  XML/JSON/SOAP  formats   •  Job  scheduling,  monitoring,   no:fica:ons   •  Central  point  of  control   •  Meta  Data  Management  
  • 26. 26 © RedPoint Global Inc. 2015 Confidential Overview - What is Hadoop? Hadoop  1.0   •  All  opera:ons  based  on  Map  Reduce   •  Intrinsic  inconsistency  of  code  based   solu:ons   •  Highly  skilled  and  expensive  resources   needed   •  3rd  party  applica:ons  constrained  by   the  need  to  generate  code   Hadoop  2.0   •  Introduc:on  of  the  YARN:                                                           “a  general-­‐purpose,  distributed,  applica:on  management   framework  that  supersedes  the  classic  Apache  Hadoop   MapReduce  framework  for  processing  data  in  Hadoop   clusters.”   •  Mature  applica:ons  can  now  operate  directly  on   Hadoop   •  Reduce  skill  requirements  and  increased   consistency                   HDFS   (Hadoop  Distributed  File  System)   YARN:    Data  Opera3ng  System   Batch   MapReduce   Batch  &  Interac3ve   Tez   Real-­‐Time   Slider   Spark   Other  ISV     Other     ISV       Stream       Storm         NoSQL         HBase   Accumulo       Other     ISV       Cascading     Scala   Java         SQL     Hive           Scrip3ng     Pig           Direct     Java   .NET         API   Engine   System   HADOOP  2.0  
  • 27. 27 © RedPoint Global Inc. 2015 Confidential RedPoint Data Management on Hadoop Par::oning   AM  /  Tasks   Execu:on   AM  /  Tasks   Data  I/O   Key  /  Split   Analysis   Parallel  Sec:on   YARN   MapReduce  
  • 28. 28 © RedPoint Global Inc. 2015 Confidential Resource   Manager   Launches   Tasks   Node  Manager   DM  App  Master   DM  Task   Node  Manager   DM  Task   DM  Task   Node  Manager   DM  Task   DM  Task   Launches  DM   App  Master   Data  Management   Designer   DM   Execu3on   Server   Parallel  Sec:on   Running  DM  Task   1 2 3 RedPoint DM for Hadoop: Processing Flow
  • 29. 29 © RedPoint Global Inc. 2015 Confidential >150  Lines  of  MR  Code   ~50  Lines  of  Script  Code   0  Lines  of  Code   6  hours  of  development   3  hours  of  development   15  min.  of  development   6  minutes  run:me   15  minutes  run:me   3  minutes  run:me   Extensive  op:miza:on   needed   User  Defined  Func:ons   required  prior  to  running   script   No  tuning  or  op:miza:on   required   RedPoint   Benchmarks – Project Gutenberg Map  Reduce   Pig   Sample  MapReduce  (small  subset  of  the  entire  code  which  totals  nearly  150  lines):   public  static  class  MapClass extends  Mapper<WordOffset, Text, Text, IntWritable> {   private  final  static  String delimiters = "',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿";   private  final  static  IntWritable one = new  IntWritable(1);   private  Text word = new  Text();   public  void  map(WordOffset key, Text value, Context context) throws  IOException, InterruptedException { String line = value.toString();   StringTokenizer itr = new  StringTokenizer(line, delimiters);   while  (itr.hasMoreTokens()) {   word.set(itr.nextToken());   context.write(word, one);   }   }   }     Sample  Pig  script  without  the  UDF:   SET  pig.maxCombinedSplitSize 67108864   SET  pig.splitCombination true   A = LOAD  '/testdata/pg/*/*/*';   B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS  word;   C = FOREACH B GENERATE UPPER(word) AS  word;   D = GROUP  C BY  word;   E = FOREACH D GENERATE COUNT(C) AS  occurrences, group;   F = ORDER  E BY  occurrences DESC;   STORE F INTO  '/user/cleonardi/pg/pig-count';
  • 30. 30 © RedPoint Global Inc. 2015 Confidential RedPoint Ranks #1
  • 31. 31 © RedPoint Global Inc. 2015 Confidential Consistent High Rankings
  • 32. 32 © RedPoint Global Inc. 2015 Confidential Data Lake Architecture for MDM
  • 33. 33 © RedPoint Global Inc. 2015 Confidential Kris Tomes Solution Director at RedPoint Demonstration Introduction
  • 34. 34 © RedPoint Global Inc. 2015 Confidential Key Factors to Consider " Traditional data architectures are challenged " Maximize the scale & cost optimization of the Hortonworks Modern Data Architecture " Leverage your DBAs to control development / implementation / production costs and schedules " Smooth out your journey to a data lake " Expedite the speed for getting business applications into production " Insist on Any Data, Any Application, Any Environment " Do your data quality and data integration in the cluster
  • 35. 35 © RedPoint Global Inc. 2015 Confidential Thank You & Please Visit Us at www.RedPoint.net Jamie  Keeffe   Product  Marke:ng  Manager       RedPoint  Global  Inc.   Jamie.Keeff[email protected]     +1  978-­‐764-­‐3839