© Hortonworks Inc. 2013
Hortonworks
Community Driven
Enterprise Apache Hadoop
Mrinal Devadas
Systems Architect
mdevadas@hortonworks.com
Page 1
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our Approach
• Patterns of Use
Page 2
© Hortonworks Inc. 2013
A Brief History of Apache Hadoop
Page 3
2013
Focus on INNOVATION
2005: Yahoo! creates
team under E14 to
work on Hadoop
Focus on OPERATIONS
2008: Yahoo team extends focus to
operations to support multiple
projects & growing clusters
Yahoo! begins to
Operate at scale
Enterprise
Hadoop
Apache Project
Established
Hortonworks
Data Platform
2004 2008 2010 20122006
STABILITY
2011: Hortonworks created to focus on
“Enterprise Hadoop“. Starts with 24
key Hadoop engineers from Yahoo
© Hortonworks Inc. 2013
Hortonworks Snapshot
Page 4
• We distribute the only 100%
Open Source Enterprise
Hadoop Distribution:
Hortonworks Data
Platform
• We engineer, test & certify
HDP for enterprise usage
• We employ the core
architects, builders and
operators of Apache Hadoop
• We drive innovation within
Apache Software
Foundation projects
• We are uniquely positioned
to deliver the highest quality
of Hadoop support
• We enable the ecosystem to
work better with Hadoop
Develop Distribute Support
We develop, distribute and support
the ONLY 100% open source
Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CA
Employees: 200+ and growing
Investors: Benchmark, Index, Yahoo
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring No Lock-In: 100% Open Source
• Patterns of Use
Page 5
© Hortonworks Inc. 2013
Page 6
Apache Software Foundation
Guiding Principles
• Release early & often
• Transparency, respect, meritocracy
Key Roles held by Hortonworkers
• PMC Members
– Managing community projects
– Mentoring new incubator projects
– Over 20 Hortonworkers managing community
• Committers
– Authoring, reviewing & editing code
– Over 50 Hortonworkers across projects
• Release Managers
– Testing & releasing projects
– Hortonworkers across key projects like Hadoop,
Hive, Pig, HCatalog, Ambari, HBase
Apache
Hadoop
Test &
Patch
Design & Develop
Release
Apache
Pig
Apache
HCatalo
g
Apache
HBase
Other
Apache
Projects
Apache
Hive
Apache
Ambari
“We have noticed more activity over the last year
from Hortonworks’ engineers on building out
Apache Hadoop’s more innovative features. These
include YARN, Ambari and HCatalog..”
- Jeff Kelly: Wikibon
Apache Community Leadership
© Hortonworks Inc. 2013
Leadership that Starts at the Core
Page 7
• Driving next generation Hadoop
– YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006
– More than twice nearest contributor
• Deeply integrating w/ecosystem
– Enabling new deployment platforms
– (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions
– (ex. Teradata big data appliance)
• All Apache, NO holdbacks
– 100% of code contributed to Apache
© Hortonworks Inc. 2013
Driving Enterprise Hadoop Innovation
Page 8
Hortonworks
Committers
Cloudera
Committers
19 8
6 1
5 0
5 9
16 0
0% 20% 40% 60% 80% 100%
AMBARI
HBASE
HIVE/HCATAL
OG
PIG
HADOOP
CORE
Lines Of Code By Company
Source: Apache Software Fundation
Hortonworks Yahoo! Cloudera Other
© Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Page 9
Upstream Community Projects Downstream Enterprise Product
Hortonworks
Data Platform
Design &
Develop
Distribute
Integrate
& Test
Package
& Certify
Apache
HCatalo
g
Apache
Pig
Apache
HBase
Other
Apache
Projects
Apache
Hive
Apache
Ambari
Apache
Hadoop
Test &
Patch
Design & Develop
Release
Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects
Stable Project
Releases
Fixed Issues
“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s
more innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop Innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring NO LOCK-IN: 100% Open Source
• Patterns of use
Page 10
© Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
Deliver high-scale
storage & processing
with enterprise-ready
platform services
Unique Focus Areas:
• Bigger, faster, more flexible
Continued focus on speed & scale and
enabling near-real-time apps
• Tested & certified at scale
Run ~1300 system tests on large Yahoo
clusters for every release
• Enterprise-ready services
High availability, disaster
recovery, snapshots, security, …
Page 11
HADOOP CORE
Hortonworkers are the
architects, operators, and builders of
core Hadoop
Distributed
Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013
Page 12
HADOOP CORE
DATA
SERVICES
Provide data services to
store, process & access
data in many ways
Unique Focus Areas:
• Apache HCatalog
Metadata services for consistent table
access to Hadoop data
• Apache Hive
Explore & process Hadoop data via SQL &
ODBC-compliant BI tools
Distributed
Storage & Processing
Hortonworks enables Hadoop data to be
accessed via existing tools & systems
Store, Proces
s and Access
Data
PLATFORM SERVICES Enterprise Readiness
Data Services for Full Data Lifecycle
© Hortonworks Inc. 2013
Operational Services for Ease of Use
Page 13
OPERATIONAL
SERVICES
Include complete
operational services for
productive operations
& management
Unique Focus Area:
• Apache Ambari:
Provision, manage & monitor a cluster;
complete REST APIs to integrate with
existing operational tools; job & task
visualizer to diagnose issues
Only Hortonworks provides a complete
open source Hadoop management tool
Manage &
Operate at
Scale
DATA
SERVICES
Store, Proces
s and Access
Data
HADOOP CORE
Distributed
Storage & Processing
PLATFORM SERVICES Enterprise Readiness
© Hortonworks Inc. 2013
OS Cloud VM Appliance
Page 14
PLATFORM SERVICES
HADOOP CORE
DATA
SERVICES
OPERATIONAL
SERVICES
Manage &
Operate at
Scale
Store, Proces
s and Access
Data
Enterprise Readiness
Only Hortonworks
allows you to deploy
seamlessly across any
deployment option
• Linux & Windows
• Azure, Rackspace & other clouds
• Virtual platforms
• Big data appliances
HORTONWORKS
DATA PLATFORM (HDP)
Distributed
Storage & Processing
Deployable Across a Range of Options
© Hortonworks Inc. 2013
OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 15
PLATFORM SERVICES
HADOOP CORE
DATA
SERVICES
OPERATIONAL
SERVICES
Manage &
Operate at
Scale
Store, Proces
s and Access
Data
HORTONWORKS
DATA PLATFORM (HDP)
Distributed
Storage & Processing
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source
and complete distribution
• Enterprise grade, proven and
tested at scale
• Ecosystem endorsed to
ensure interoperability
Enterprise Readiness
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop Innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring No Lock-in: 100% Open Source
• Patterns of use
Page 16
© Hortonworks Inc. 2013
Existing Data ArchitectureAPPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MP
P
DATASOURCES
OLTP, PO
S
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
Page 17
© Hortonworks Inc. 2013
Next-Generation Data ArchitectureAPPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MP
P
DATASOURCES
OLTP, PO
S
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
ENTERPRISE
HADOOP PLATFORM
Page 18
© Hortonworks Inc. 2013
Interoperating With Your Tools
Page 19
APPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
DEV & DATA
TOOLS
OPERATIONAL
TOOLS
Viewpoint
Microsoft Applications
HORTONWORKS
DATA PLATFORM
DATASOURCES
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
– Leading Open Source Hadoop Innovation
– Addressing “Enterprise Hadoop” Requirements
– Enabling Interoperability of the Ecosystem
– Ensuring No Lock-In: 100% Open Source
• Patterns of use
Page 20
© Hortonworks Inc. 2013
True Enterprise Class Open Source
• Community-driven Approach Mitigates Lock-In
–Identify & introduce enterprise requirements into public domain
–Work with community to advance & incubate open source projects
–Apply Enterprise Rigor for the most stable and reliable distribution
• 100% Open Source. No Holdbacks.
–Only true implementation of OSS Apache Hadoop
–Preferred by the software vendors that you rely on
–Proprietary Open Source = Lock-In
–Open communities always trump “open source”
• Flexible Deployment
–No License Fee for usage
Page 21
© Hortonworks Inc. 2013
Hortonworks
• Who is Hortonworks
• Our approach
• Patterns of use
Page 22
© Hortonworks Inc. 2013
Big Data
Transactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKS
DATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 23
© Hortonworks Inc. 2013
Operational Data RefineryDATASYSTEMSDATASOURCES
1
3
1 Capture
Process
Distribute & Retain
2
3
Refine Explore
Enric
h
2
APPLICATIONS
Transform & refine ALL
sources of data
Also known as Data
Reservoir or Catch Basin
TRADITIONAL REPOS
RDBMS EDW MPP
Business
Analytics
Custom
Applications
Enterprise
Applications
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
Page 24
HORTONWORKS
DATA PLATFORM
© Hortonworks Inc. 2013
Big Data Exploration & VisualizationDATASYSTEMSDATASOURCES
Refine Explore Enrich
APPLICATIONS
Leverage “data lake”
to perform iterative
investigation for value
3
2
TRADITIONAL REPOS
RDBMS EDW MPP
1
Business
Analytics
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
Custom
Applications
Enterprise
Applications
1 Capture
Process
Explore & Visualize
2
3
Page 25
HORTONWORKS
DATA PLATFORM
© Hortonworks Inc. 2013
DATASYSTEMSDATASOURCES
Refine Explore Enrich
APPLICATIONS
Create intelligent
applications
Collect data, create
analytical models and
deliver to online apps
3
1
2
TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
Custom
Applications
Enterprise
Applications
NOSQL
1 Capture
Process & Compute
Deliver Model
2
3
Page 26
Application Enrichment
HORTONWORKS
DATA PLATFORM
© Hortonworks Inc. 2013
Flexible Support Subscription Programs
Leverage Hortonworks Expertise: Subscription and Support delivered and
backed by Hadoop experts; subscriptions based on nodes or storage
Page 27
Developer Support
“How to” guidance for
developers and archs
Essential Support*
Operations support for
small research clusters
Standard Support
Operations support for
dev & test clusters
12 x 5
Web only
12 x 5
Web only
All Sev:
1 business day
All Sev:
1 business day
12 x 5
Web only
Application
Design Advice
Code Review
Cluster
Design, Install, Maintai
n, Performance
Cluster
Design, Install, Maintai
n, Performance
All Sev:
1 business day
1 seat
3
Contacts
3
Contacts
Patches &
Updates
Patches &
Updates
* Limited in size and no expansion
Enterprise Support
Operations support for
critical clusters
24 x 7
Phone &
Web
Sev 1: 1 Hour
Sev 2: 4 Bus Hour
Cluster
Design, Install, Maintai
n, Performance
5
Contacts
Patches &
Updates
Additional Options
© Hortonworks Inc. 2013
Hortonworks: Best In Class Hadoop Support
• Experienced enterprise support team
– Experience supporting enterprise clients in production
– Core engineers have real operational
experience: built and supported 44+K nodes in production
– Extensive experience in commercial big data offerings
including HDP, MapR, Karmasphere
• Global 24x7 operation – support based in Sunnyvale, UK & India
• Stringent case management processes ensures high quality customer
service & responsiveness
Page 28
© Hortonworks Inc. 2013
Transferring Our Hadoop Expertise to You
The expert source for
Apache Hadoop training & certification
• World class training programs designed to
help you learn fast
– Role-based hands on classes with 50% lab time
• Expert consulting services
– Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox program
– Fastest way to learn Apache Hadoop
– Multi-level tutorials for wide applicability
– Customizable and updateable
Page 29
© Hortonworks Inc. 2013
Introducing Hortonworks Data Platform for Windows
Enterprise Apache Hadoop
March 2013
Page 30
© Hortonworks Inc. 2013
Why Apache Hadoop on Windows?
• According to IDC Windows Server held 73% market share in 2012
– Hadoop was traditionally built for Linux servers so there are a large number of underserved
organizations
• According to 2012 Barclays CIO study big data outranks
virtualization as #1 trend driving spending initiatives
– Unstructured data growth exceeds 80% year/year in most enterprises
• Apache Hadoop is the defacto big data platform
for processing massive amounts of unstructured data
– Complementary to existing Microsoft technologies
– There is a huge untapped community of Windows developers and ecosystem partners
• A strong Microsoft-Hortonworks partnership and 18 months of
development makes this a natural next step
Page 31
© Hortonworks Inc. 2013
Hortonworks Data Platform for Windows
• Enterprise-grade Apache Hadoop on Windows
– Enables same experience for Hadoop on Windows & Linux
• More partners, more developers for Hadoop
– Makes native Apache Hadoop available to Windows ecosystem
– More options for Windows focused organizations
• Hortonworks focus: Enterprise Apache Hadoop for all platforms
– Trusted reliable production-ready distribution for on-premise Hadoop on Windows
deployments
• Built with joint investment and contributions from Microsoft
– Deep engineering relationship ensures tight integration and maximum performance
Page 32
HDP is the first and only distribution available on Windows & Linux
© Hortonworks Inc. 2013
Seamless Interoperability with Your Microsoft Tools
• Integrated with Microsoft tools
for native big data analysis
– Bi-directional connectors for SQL
Server and SQL Azure through SQOOP
– Excel ODBC integration through Hive
• Addressing demand for Hadoop
on Windows
– Ideal for Windows customers with
Hadoop operational experience
• Enables most common Hadoop
workloads in the Enterprise
– Data refinement and ETL offload for
high-volume data landing
– Data exploration for discovery of new
business opportunities
– Data enrichment for fined tuned delivery
and recommendation engines
Page 33
APPLICATIONSDATASYSTEMS
Microsoft Applications
HORTONWORKS
DATA PLATFORM
For Windows
DATASOURCES
MOBILE
DATA
OLTP, PO
S
SYSTEMS
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
© Hortonworks Inc. 2013
Inside HDP for Windows
Page 34
HORTONWORKS
DATA PLATFORM (HDP)
For Windows
Hortonworks
Data Platform (HDP)
For Windows
• 100% Open Source
Enterprise Hadoop
• Component and version
compatible with HDInsight
• Availability
• Beta release available now
PLATFORM SERVICES
HADOOP CORE
Distributed
Storage & ProcessingHDFS
WEBHDFS
MAP REDUCE
DATA
SERVICES
Store, Proces
s and Access
Data
HCATALOG
HIVEPIG
SQOOP
OPERATIONAL
SERVICES
Manage &
Operate at
ScaleOOZIE
© Hortonworks Inc. 2013
Maximize Your Hadoop Deployment Choice
• Use HDP for Windows for on-premises deployment on Windows Server
– Ideal for Windows users with Hadoop experience
– Perfect next step for those who are ready to move from POC to production
• Use HDInsight for Microsoft tooling and Management and Provisioning
– HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) –
available in Preview today
– HDInsight Server for full integration of Hadoop with Microsoft tools on premises –
Developer Preview available today
• Full interoperability and deployment choice across platforms
– Implement big data applications that run on-premise & cloud
– By leveraging open source HDP, enables seamless interoperability across
environments: Linux, Windows, Windows Azure
Page 35
© Hortonworks Inc. 2013
Summary
• Leading the Innovation in Core Hadoop
• Addressing the requirements for Enterprise usage
• Enabling interoperability of the ecosystem
• No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out more
–www.hortonworks.com
–http://hortonworks.com/hadoop-training/
Page 36

More Related Content

PDF
Hortonworks Presentation at Big Data London
PPTX
Bigger Data For Your Budget
PPTX
State of the Union with Shaun Connolly
PPTX
Munich HUG 21.11.2013
PPTX
Hortonworks for Financial Analysts Presentation
PDF
OSDC 2013 | Introduction into Hadoop by Olivier Renault
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PPTX
Don't Let Security Be The 'Elephant in the Room'
Hortonworks Presentation at Big Data London
Bigger Data For Your Budget
State of the Union with Shaun Connolly
Munich HUG 21.11.2013
Hortonworks for Financial Analysts Presentation
OSDC 2013 | Introduction into Hadoop by Olivier Renault
Supporting Financial Services with a More Flexible Approach to Big Data
Don't Let Security Be The 'Elephant in the Room'

What's hot (20)

PDF
Apache Hadoop on the Open Cloud
PPTX
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
PDF
Pig Out to Hadoop
PPTX
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
PDF
Hortonworks Technical Workshop - build a yarn ready application with apache ...
PDF
Hortonworks Technical Workshop: What's New in HDP 2.3
PPTX
Introduction to the Hortonworks YARN Ready Program
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
PDF
Storm Demo Talk - Colorado Springs May 2015
PPTX
YARN Ready - Integrating to YARN using Slider Webinar
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
PDF
Combine SAS High-Performance Capabilities with Hadoop YARN
PPTX
Enabling the Real Time Analytical Enterprise
PDF
Discover.hdp2.2.h base.final[2]
PDF
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
PPTX
Apache Ambari: Managing Hadoop and YARN
PPTX
201305 hadoop jpl-v3
PPTX
Go Zero to Big Data in 15 Minutes with the Hortonworks Sandbox
Apache Hadoop on the Open Cloud
Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data...
Pig Out to Hadoop
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop: What's New in HDP 2.3
Introduction to the Hortonworks YARN Ready Program
Enterprise Hadoop with Hortonworks and Nimble Storage
Storm Demo Talk - Colorado Springs May 2015
YARN Ready - Integrating to YARN using Slider Webinar
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Combine SAS High-Performance Capabilities with Hadoop YARN
Enabling the Real Time Analytical Enterprise
Discover.hdp2.2.h base.final[2]
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Apache Ambari: Managing Hadoop and YARN
201305 hadoop jpl-v3
Go Zero to Big Data in 15 Minutes with the Hortonworks Sandbox
Ad

Viewers also liked (7)

PPT
Microsoft Azure User Group
PPT
eSynergy Keiran Sweet - Bringing order to chaos with puppet
PDF
Barak regev
PPTX
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
PPTX
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
PDF
Rik Van Bruggen - Getting beer into and out of neo4j
PDF
Decreto sobre renovación de transporte.
Microsoft Azure User Group
eSynergy Keiran Sweet - Bringing order to chaos with puppet
Barak regev
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
Rik Van Bruggen - Getting beer into and out of neo4j
Decreto sobre renovación de transporte.
Ad

Similar to Mrinal devadas, Hortonworks Making Sense Of Big Data (20)

PPTX
Yahoo! Hack Europe
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PDF
Meetup oslo hortonworks HDP
PDF
Hortonworks Hadoop @ Oslo Hadoop User Group
PPTX
Ben Marden - Making sense of Big Data
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
PDF
Introduction to Hadoop
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
Discover.hdp2.2.ambari.final[1]
PPTX
A First-Hand Look at What's New in HDP 2.3
PPTX
Hadoop crashcourse v3
PDF
Hortonworks and Platfora in Financial Services - Webinar
PPTX
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
PDF
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
PDF
Hortonworks & Bilot Data Driven Transformations with Hadoop
PPTX
Internet of Things Crash Course Workshop at Hadoop Summit
PPTX
Internet of things Crash Course Workshop
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
Yahoo! Hack Europe
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Meetup oslo hortonworks HDP
Hortonworks Hadoop @ Oslo Hadoop User Group
Ben Marden - Making sense of Big Data
Hortonworks - What's Possible with a Modern Data Architecture?
Introduction to Hadoop
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Supporting Financial Services with a More Flexible Approach to Big Data
Discover.hdp2.2.ambari.final[1]
A First-Hand Look at What's New in HDP 2.3
Hadoop crashcourse v3
Hortonworks and Platfora in Financial Services - Webinar
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks & Bilot Data Driven Transformations with Hadoop
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of things Crash Course Workshop
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Webinar turbo charging_data_science_hawq_on_hdp_final

More from PatrickCrompton (11)

PDF
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
PPT
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
PPTX
APSCo Cup Winners 2013
PDF
Happy Easter
PPTX
Top 10 photos from Comic Relief 2013
PPTX
Team photo
PPTX
Cloud and Big Data Conference Images
PPTX
Tim Marston.
PPTX
Tim marston
PPTX
Michael newberry
PPTX
Andy cross
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
APSCo Cup Winners 2013
Happy Easter
Top 10 photos from Comic Relief 2013
Team photo
Cloud and Big Data Conference Images
Tim Marston.
Tim marston
Michael newberry
Andy cross

Recently uploaded (20)

PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
SaaS reusability assessment using machine learning techniques
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PPTX
Microsoft User Copilot Training Slide Deck
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Module 1 Introduction to Web Programming .pptx
SaaS reusability assessment using machine learning techniques
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Rapid Prototyping: A lecture on prototyping techniques for interface design
Microsoft User Copilot Training Slide Deck
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
MuleSoft-Compete-Deck for midddleware integrations
A symptom-driven medical diagnosis support model based on machine learning te...
Basics of Cloud Computing - Cloud Ecosystem
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Data Virtualization in Action: Scaling APIs and Apps with FME
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Convolutional neural network based encoder-decoder for efficient real-time ob...

Mrinal devadas, Hortonworks Making Sense Of Big Data

  • 1. © Hortonworks Inc. 2013 Hortonworks Community Driven Enterprise Apache Hadoop Mrinal Devadas Systems Architect [email protected] Page 1
  • 2. © Hortonworks Inc. 2013 Hortonworks • Who is Hortonworks • Our Approach • Patterns of Use Page 2
  • 3. © Hortonworks Inc. 2013 A Brief History of Apache Hadoop Page 3 2013 Focus on INNOVATION 2005: Yahoo! creates team under E14 to work on Hadoop Focus on OPERATIONS 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Yahoo! begins to Operate at scale Enterprise Hadoop Apache Project Established Hortonworks Data Platform 2004 2008 2010 20122006 STABILITY 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo
  • 4. © Hortonworks Inc. 2013 Hortonworks Snapshot Page 4 • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop Develop Distribute Support We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo
  • 5. © Hortonworks Inc. 2013 Hortonworks • Who is Hortonworks • Our approach – Leading Open Source Hadoop innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring No Lock-In: 100% Open Source • Patterns of Use Page 5
  • 6. © Hortonworks Inc. 2013 Page 6 Apache Software Foundation Guiding Principles • Release early & often • Transparency, respect, meritocracy Key Roles held by Hortonworkers • PMC Members – Managing community projects – Mentoring new incubator projects – Over 20 Hortonworkers managing community • Committers – Authoring, reviewing & editing code – Over 50 Hortonworkers across projects • Release Managers – Testing & releasing projects – Hortonworkers across key projects like Hadoop, Hive, Pig, HCatalog, Ambari, HBase Apache Hadoop Test & Patch Design & Develop Release Apache Pig Apache HCatalo g Apache HBase Other Apache Projects Apache Hive Apache Ambari “We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..” - Jeff Kelly: Wikibon Apache Community Leadership
  • 7. © Hortonworks Inc. 2013 Leadership that Starts at the Core Page 7 • Driving next generation Hadoop – YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery • 420k+ lines authored since 2006 – More than twice nearest contributor • Deeply integrating w/ecosystem – Enabling new deployment platforms – (ex. Windows & Azure, Linux & VMware HA) – Creating deeply engineered solutions – (ex. Teradata big data appliance) • All Apache, NO holdbacks – 100% of code contributed to Apache
  • 8. © Hortonworks Inc. 2013 Driving Enterprise Hadoop Innovation Page 8 Hortonworks Committers Cloudera Committers 19 8 6 1 5 0 5 9 16 0 0% 20% 40% 60% 80% 100% AMBARI HBASE HIVE/HCATAL OG PIG HADOOP CORE Lines Of Code By Company Source: Apache Software Fundation Hortonworks Yahoo! Cloudera Other
  • 9. © Hortonworks Inc. 2013 Hortonworks Process for Enterprise Hadoop Page 9 Upstream Community Projects Downstream Enterprise Product Hortonworks Data Platform Design & Develop Distribute Integrate & Test Package & Certify Apache HCatalo g Apache Pig Apache HBase Other Apache Projects Apache Hive Apache Ambari Apache Hadoop Test & Patch Design & Develop Release Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Stable Project Releases Fixed Issues “We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon
  • 10. © Hortonworks Inc. 2013 Hortonworks • Who is Hortonworks • Our approach – Leading Open Source Hadoop Innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring NO LOCK-IN: 100% Open Source • Patterns of use Page 10
  • 11. © Hortonworks Inc. 2013 Enhancing the Core of Apache Hadoop Deliver high-scale storage & processing with enterprise-ready platform services Unique Focus Areas: • Bigger, faster, more flexible Continued focus on speed & scale and enabling near-real-time apps • Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release • Enterprise-ready services High availability, disaster recovery, snapshots, security, … Page 11 HADOOP CORE Hortonworkers are the architects, operators, and builders of core Hadoop Distributed Storage & Processing PLATFORM SERVICES Enterprise Readiness
  • 12. © Hortonworks Inc. 2013 Page 12 HADOOP CORE DATA SERVICES Provide data services to store, process & access data in many ways Unique Focus Areas: • Apache HCatalog Metadata services for consistent table access to Hadoop data • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Distributed Storage & Processing Hortonworks enables Hadoop data to be accessed via existing tools & systems Store, Proces s and Access Data PLATFORM SERVICES Enterprise Readiness Data Services for Full Data Lifecycle
  • 13. © Hortonworks Inc. 2013 Operational Services for Ease of Use Page 13 OPERATIONAL SERVICES Include complete operational services for productive operations & management Unique Focus Area: • Apache Ambari: Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues Only Hortonworks provides a complete open source Hadoop management tool Manage & Operate at Scale DATA SERVICES Store, Proces s and Access Data HADOOP CORE Distributed Storage & Processing PLATFORM SERVICES Enterprise Readiness
  • 14. © Hortonworks Inc. 2013 OS Cloud VM Appliance Page 14 PLATFORM SERVICES HADOOP CORE DATA SERVICES OPERATIONAL SERVICES Manage & Operate at Scale Store, Proces s and Access Data Enterprise Readiness Only Hortonworks allows you to deploy seamlessly across any deployment option • Linux & Windows • Azure, Rackspace & other clouds • Virtual platforms • Big data appliances HORTONWORKS DATA PLATFORM (HDP) Distributed Storage & Processing Deployable Across a Range of Options
  • 15. © Hortonworks Inc. 2013 OS Cloud VM Appliance HDP: Enterprise Hadoop Distribution Page 15 PLATFORM SERVICES HADOOP CORE DATA SERVICES OPERATIONAL SERVICES Manage & Operate at Scale Store, Proces s and Access Data HORTONWORKS DATA PLATFORM (HDP) Distributed Storage & Processing Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability Enterprise Readiness
  • 16. © Hortonworks Inc. 2013 Hortonworks • Who is Hortonworks • Our approach – Leading Open Source Hadoop Innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring No Lock-in: 100% Open Source • Patterns of use Page 16
  • 17. © Hortonworks Inc. 2013 Existing Data ArchitectureAPPLICATIONSDATASYSTEMS TRADITIONAL REPOS RDBMS EDW MP P DATASOURCES OLTP, PO S SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications Page 17
  • 18. © Hortonworks Inc. 2013 Next-Generation Data ArchitectureAPPLICATIONSDATASYSTEMS TRADITIONAL REPOS RDBMS EDW MP P DATASOURCES OLTP, PO S SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications ENTERPRISE HADOOP PLATFORM Page 18
  • 19. © Hortonworks Inc. 2013 Interoperating With Your Tools Page 19 APPLICATIONSDATASYSTEMS TRADITIONAL REPOS DEV & DATA TOOLS OPERATIONAL TOOLS Viewpoint Microsoft Applications HORTONWORKS DATA PLATFORM DATASOURCES Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media)
  • 20. © Hortonworks Inc. 2013 Hortonworks • Who is Hortonworks • Our approach – Leading Open Source Hadoop Innovation – Addressing “Enterprise Hadoop” Requirements – Enabling Interoperability of the Ecosystem – Ensuring No Lock-In: 100% Open Source • Patterns of use Page 20
  • 21. © Hortonworks Inc. 2013 True Enterprise Class Open Source • Community-driven Approach Mitigates Lock-In –Identify & introduce enterprise requirements into public domain –Work with community to advance & incubate open source projects –Apply Enterprise Rigor for the most stable and reliable distribution • 100% Open Source. No Holdbacks. –Only true implementation of OSS Apache Hadoop –Preferred by the software vendors that you rely on –Proprietary Open Source = Lock-In –Open communities always trump “open source” • Flexible Deployment –No License Fee for usage Page 21
  • 22. © Hortonworks Inc. 2013 Hortonworks • Who is Hortonworks • Our approach • Patterns of use Page 22
  • 23. © Hortonworks Inc. 2013 Big Data Transactions, Interactions, Observations Hadoop Common Patterns of Use Business Cases HORTONWORKS DATA PLATFORM Refine Explore Enrich Batch Interactive Online “Right-time” Access to Data Page 23
  • 24. © Hortonworks Inc. 2013 Operational Data RefineryDATASYSTEMSDATASOURCES 1 3 1 Capture Process Distribute & Retain 2 3 Refine Explore Enric h 2 APPLICATIONS Transform & refine ALL sources of data Also known as Data Reservoir or Catch Basin TRADITIONAL REPOS RDBMS EDW MPP Business Analytics Custom Applications Enterprise Applications Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Page 24 HORTONWORKS DATA PLATFORM
  • 25. © Hortonworks Inc. 2013 Big Data Exploration & VisualizationDATASYSTEMSDATASOURCES Refine Explore Enrich APPLICATIONS Leverage “data lake” to perform iterative investigation for value 3 2 TRADITIONAL REPOS RDBMS EDW MPP 1 Business Analytics Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applications Enterprise Applications 1 Capture Process Explore & Visualize 2 3 Page 25 HORTONWORKS DATA PLATFORM
  • 26. © Hortonworks Inc. 2013 DATASYSTEMSDATASOURCES Refine Explore Enrich APPLICATIONS Create intelligent applications Collect data, create analytical models and deliver to online apps 3 1 2 TRADITIONAL REPOS RDBMS EDW MPP Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applications Enterprise Applications NOSQL 1 Capture Process & Compute Deliver Model 2 3 Page 26 Application Enrichment HORTONWORKS DATA PLATFORM
  • 27. © Hortonworks Inc. 2013 Flexible Support Subscription Programs Leverage Hortonworks Expertise: Subscription and Support delivered and backed by Hadoop experts; subscriptions based on nodes or storage Page 27 Developer Support “How to” guidance for developers and archs Essential Support* Operations support for small research clusters Standard Support Operations support for dev & test clusters 12 x 5 Web only 12 x 5 Web only All Sev: 1 business day All Sev: 1 business day 12 x 5 Web only Application Design Advice Code Review Cluster Design, Install, Maintai n, Performance Cluster Design, Install, Maintai n, Performance All Sev: 1 business day 1 seat 3 Contacts 3 Contacts Patches & Updates Patches & Updates * Limited in size and no expansion Enterprise Support Operations support for critical clusters 24 x 7 Phone & Web Sev 1: 1 Hour Sev 2: 4 Bus Hour Cluster Design, Install, Maintai n, Performance 5 Contacts Patches & Updates Additional Options
  • 28. © Hortonworks Inc. 2013 Hortonworks: Best In Class Hadoop Support • Experienced enterprise support team – Experience supporting enterprise clients in production – Core engineers have real operational experience: built and supported 44+K nodes in production – Extensive experience in commercial big data offerings including HDP, MapR, Karmasphere • Global 24x7 operation – support based in Sunnyvale, UK & India • Stringent case management processes ensures high quality customer service & responsiveness Page 28
  • 29. © Hortonworks Inc. 2013 Transferring Our Hadoop Expertise to You The expert source for Apache Hadoop training & certification • World class training programs designed to help you learn fast – Role-based hands on classes with 50% lab time • Expert consulting services – Programs designed to transfer knowledge • Industry leading Hadoop Sandbox program – Fastest way to learn Apache Hadoop – Multi-level tutorials for wide applicability – Customizable and updateable Page 29
  • 30. © Hortonworks Inc. 2013 Introducing Hortonworks Data Platform for Windows Enterprise Apache Hadoop March 2013 Page 30
  • 31. © Hortonworks Inc. 2013 Why Apache Hadoop on Windows? • According to IDC Windows Server held 73% market share in 2012 – Hadoop was traditionally built for Linux servers so there are a large number of underserved organizations • According to 2012 Barclays CIO study big data outranks virtualization as #1 trend driving spending initiatives – Unstructured data growth exceeds 80% year/year in most enterprises • Apache Hadoop is the defacto big data platform for processing massive amounts of unstructured data – Complementary to existing Microsoft technologies – There is a huge untapped community of Windows developers and ecosystem partners • A strong Microsoft-Hortonworks partnership and 18 months of development makes this a natural next step Page 31
  • 32. © Hortonworks Inc. 2013 Hortonworks Data Platform for Windows • Enterprise-grade Apache Hadoop on Windows – Enables same experience for Hadoop on Windows & Linux • More partners, more developers for Hadoop – Makes native Apache Hadoop available to Windows ecosystem – More options for Windows focused organizations • Hortonworks focus: Enterprise Apache Hadoop for all platforms – Trusted reliable production-ready distribution for on-premise Hadoop on Windows deployments • Built with joint investment and contributions from Microsoft – Deep engineering relationship ensures tight integration and maximum performance Page 32 HDP is the first and only distribution available on Windows & Linux
  • 33. © Hortonworks Inc. 2013 Seamless Interoperability with Your Microsoft Tools • Integrated with Microsoft tools for native big data analysis – Bi-directional connectors for SQL Server and SQL Azure through SQOOP – Excel ODBC integration through Hive • Addressing demand for Hadoop on Windows – Ideal for Windows customers with Hadoop operational experience • Enables most common Hadoop workloads in the Enterprise – Data refinement and ETL offload for high-volume data landing – Data exploration for discovery of new business opportunities – Data enrichment for fined tuned delivery and recommendation engines Page 33 APPLICATIONSDATASYSTEMS Microsoft Applications HORTONWORKS DATA PLATFORM For Windows DATASOURCES MOBILE DATA OLTP, PO S SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)
  • 34. © Hortonworks Inc. 2013 Inside HDP for Windows Page 34 HORTONWORKS DATA PLATFORM (HDP) For Windows Hortonworks Data Platform (HDP) For Windows • 100% Open Source Enterprise Hadoop • Component and version compatible with HDInsight • Availability • Beta release available now PLATFORM SERVICES HADOOP CORE Distributed Storage & ProcessingHDFS WEBHDFS MAP REDUCE DATA SERVICES Store, Proces s and Access Data HCATALOG HIVEPIG SQOOP OPERATIONAL SERVICES Manage & Operate at ScaleOOZIE
  • 35. © Hortonworks Inc. 2013 Maximize Your Hadoop Deployment Choice • Use HDP for Windows for on-premises deployment on Windows Server – Ideal for Windows users with Hadoop experience – Perfect next step for those who are ready to move from POC to production • Use HDInsight for Microsoft tooling and Management and Provisioning – HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) – available in Preview today – HDInsight Server for full integration of Hadoop with Microsoft tools on premises – Developer Preview available today • Full interoperability and deployment choice across platforms – Implement big data applications that run on-premise & cloud – By leveraging open source HDP, enables seamless interoperability across environments: Linux, Windows, Windows Azure Page 35
  • 36. © Hortonworks Inc. 2013 Summary • Leading the Innovation in Core Hadoop • Addressing the requirements for Enterprise usage • Enabling interoperability of the ecosystem • No lock-in. 100% Open Source. • Best in industry support with flexible pricing model • Find out more –www.hortonworks.com –http://hortonworks.com/hadoop-training/ Page 36

Editor's Notes

  • #8: In that capacity,Arun allows Hortonworks to be instrumental in working with the community to drive the roadmap for Core Hadoop, where the focus today is on things like YARN, MapReduce2, HDFS2 and more.For Core Hadoop, in absolute terms, Hortonworkers have contributed more than twice as many lines of code as the next closest contributor, and even more if you include Yahoo, our development partner. Taking such a prominent role also enables us to ensure that our distribution integrates deeply with the ecosystem: on both choice of deployment platforms such as Windows, Azure and more, but also to create deeply engineered solutions with key partners such as Teradata.And consistent with our approach, all of this is done in 100% open source.