SlideShare a Scribd company logo
Hadoop’s Opportunity to Power
Next-Generation Architectures
Shaun Connolly, Hortonworks Strategy
June 13, 2012
How many people are
    lucky enough
to say that they were
  at the forefront of
   something big?
Transactions

 Interactions

Observations
Big Data = Transactions + Interactions + Observations

                                                                                    BIG DATA              User Generated Content
                          Sensors / RFID / Devices
Petabytes                                                 Mobile Web                      Sentiment
                                                                                                        Social Interactions & Feeds
                             User Click Stream
                                                                                                                Spatial & GPS
                                   Web logs                       WEB             A/B testing                    Coordinates
Terabytes                                                                                                          External
                           Offer history                                    Dynamic Pricing
                                                                                                                 Demographics
                                                                                   Affiliate Networks
                                                                                                             Business Data Feeds
                                         CRM
 Gigabytes                                              Segmentation               Search Marketing        HD Video, Audio, Images

                               ERP                           Offer details            Behavioral               Speech to Text
                          Purchase detail                                             Targeting
 Megabytes                Purchase record          Customer Touches                                        Product/Service Logs
                                                                                   Dynamic Funnels
                          Payment record                                                                         SMS/MMS
                                                     Support Contacts


                                Increasing Data Variety and Complexity
  Source: Contents of above graphic created in partnership with Teradata, Inc.
There is still work to
 be done to ensure
     HADOOP
    powers the
  BIG DATA WAVE
Many Communities Must Work As One

• Be diligent stewards of the
  open source core

• Be tireless innovators                     Open Source
  beyond the core
                                 Vendors
• Provide robust data platform
  services & open APIs

• Enable ecosystem at each
                                           End Users
  layer of the stack

• Make platform enterprise-
  ready & easy to use
Top 10 Influencers of the Decade
     1.  Google
     2.  Apple
     3.  Apache Software Foundation
     4.  Microsoft
     5.  Linux Foundation
     6.  Eclipse Foundation
     7.  Twitter
     8.  Free Software Foundation
     9.  Android Project
     10. VMware
Source: SD Times, http://www.sdtimes.com/link/36666
Top 10 Influencers of the Decade




                   #3


Source: SD Times, http://www.sdtimes.com/link/36666
Diligent Stewards & Tireless Innovators




Pig                                Avro
Hive                               Cascading
HBase                              Accumulo
Zookeeper                          Whirr
HCatalog                           Chukwa
Ambari                             Snappy
Sqoop                              Spark
Oozie                              HAMA
                                   Giraph
Flume
                                   OpenMPI
Mahout
             1.0    2.0   Beyond
[Integrating Hadoop with
existing IT investments is
vitally important.]
                   Larry Feinsmith
Connecting Transactions + Interactions + Observations
 Audio,              Retain runtime models and
 Video,
Images
                      historical data for ongoing   4         Business
                           refinement & analysis
                                                            Transactions
 Docs,
 Text,                                                      & Interactions
 XML


  Web
 Logs,
                                                                        Web, Mobile, CRM,
 Clicks                                                                 ERP, SCM, …
                      Big Data
Social,               Refinery                                                                       Classic
Graph,
                                                    3   Share refined data and                    1     ETL
Feeds
                                                        runtime models                            processing
Sensors,     2
Devices,
  RFID
           Store, aggregate, and
           transform multi-structured                          Business
Spatial,   data to unlock value                               Intelligence
 GPS
                                                              & Analytics
                            Retain historical data to
Events,
 Other
                            unlock additional value     5
                                                                           Dashboards, Reports,
                                                                           Visualization, …
Next-Generation Big Data Architecture
 Audio,                         Web, Mobile, CRM,
 Video,
Images                          ERP, SCM, …       Business
                                                Transactions
 Docs,
 Text,                                          & Interactions
 XML


  Web
 Logs,
 Clicks
                   Big Data
Social,            Refinery                         SQL   NoSQL     NewSQL
Graph,
Feeds

                                                    EDW    MPP      NewSQL
Sensors,
Devices,
  RFID

           Arrows powered by                          Business
Spatial,
 GPS
                ETL, data                            Intelligence
           movement, and data                        & Analytics
               integration
Events,       technologies
 Other                           Dashboards, Reports,
                                 Visualization, …
Data Services & Open APIs are Vital


         Raw hadoop data                        Table access
         Inconsistent metadata
         Tool specific access
                                 HCatalog   Aligned metadata
                                                 RESTful API




Apache HCatalog: Hadoop’s centralized metadata service
ü  Provide consistent metadata and data models across tools
ü  Share data as tables in and out of HDFS
ü  Enable flexible, thin-client access via RESTful APIs
Data Services & Open APIs In Action

                                                   Analyze website visits by the
  1     Web Log files via WebHDFS APIs         4
                                                   type of end results


  Website    Web
Interactions Logs

                                    Big Data
      Order                         Refinery
               DB
      Data


Customer
               DB
  Data


        Customer & Order data via Talend           Process, analyze, and join data
 2                                             3
        & HCatalog for schema                      via Talend, Pig, & HCatalog
Let’s Head to the Demo Kitchen
Ecosystem Completes the Puzzle
Applications, Business Tools, & Dev Tools




Data Management & Movement




Infrastructure & Systems Management
Solution Architectures:
  Make Hadoop Enterprise-Ready & Easy to Use
Applications, Business Tools, & Dev Tools




Data Management & Movement




Infrastructure & Systems Management
Our Opportunity…and Our Role

            By the end of 2015,
    more than half the world's data will be
      processed by Apache Hadoop.
1   Be diligent stewards of the open source core

2   Be tireless innovators beyond the core

3   Provide robust data platform services & open APIs

4   Enable the ecosystem at each layer of the stack

5   Make the platform enterprise-ready & easy to use

More Related Content

PDF
Tackling big data with hadoop and open source integration
DataWorks Summit
 
PDF
Delivering next generation enterprise no sql database technology
marcmcneill
 
PDF
Scaling MySQL: Benefits of Automatic Data Distribution
ScaleBase
 
PDF
Embedded Analytics: The Next Mega-Wave of Innovation
Inside Analysis
 
PDF
Hadoop - Now, Next and Beyond
Teradata Aster
 
PDF
Security, Governance & Integration in a Cloud Connected World
CA API Management
 
PDF
BI Forum 2009 - BI Mega Trends
OKsystem
 
PDF
IBM Stream au Hadoop User Group
Modern Data Stack France
 
Tackling big data with hadoop and open source integration
DataWorks Summit
 
Delivering next generation enterprise no sql database technology
marcmcneill
 
Scaling MySQL: Benefits of Automatic Data Distribution
ScaleBase
 
Embedded Analytics: The Next Mega-Wave of Innovation
Inside Analysis
 
Hadoop - Now, Next and Beyond
Teradata Aster
 
Security, Governance & Integration in a Cloud Connected World
CA API Management
 
BI Forum 2009 - BI Mega Trends
OKsystem
 
IBM Stream au Hadoop User Group
Modern Data Stack France
 

What's hot (20)

PDF
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud
 
PPTX
OWF12/Java Michael hirt
Paris Open Source Summit
 
PDF
Big Data for Everyman
Michael Wilde
 
PDF
The Best Analytics Tools
Datalicious
 
PDF
Katrina marques presentation
Ark Group Australia Pty Ltd
 
PDF
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
 
PDF
Vision - The Agile Data Center
incommoninc
 
PPTX
Module 3 Adapative Customer Experience Final
Vivastream
 
PDF
HCLT Brochure: E-Discovery and Document Review Solutions
HCL Technologies
 
PDF
2012.04.26 big insights streams im forum2
Wilfried Hoge
 
PDF
Le Cloud de proximité by Monaco Telecom et Interxion
Yannick Quentel
 
PPT
Striving for an Outstanding IT Organization
Huberto Garza
 
PDF
Big Data launch keynote Singapore Patrick Buddenbaum
IntelAPAC
 
PDF
Enterprise Security Architecture: From Access to Audit
Bob Rhubart
 
PDF
Ensuring Mobile BI Success
Birst
 
PDF
Identity Insights: Social, Local and Mobile Identity
Jon Bultmeyer
 
PPTX
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
Stichting ePortfolio Support
 
PDF
Simplified Business Event Processing
Nigel Green
 
PPTX
Sql server 2012 smart dive presentation 20120126
Andrew Mauch
 
PDF
Wed 1130 aasman_jans_color
DATAVERSITY
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud
 
OWF12/Java Michael hirt
Paris Open Source Summit
 
Big Data for Everyman
Michael Wilde
 
The Best Analytics Tools
Datalicious
 
Katrina marques presentation
Ark Group Australia Pty Ltd
 
Analyse prédictive en assurance santé par Julien Cabot
Modern Data Stack France
 
Vision - The Agile Data Center
incommoninc
 
Module 3 Adapative Customer Experience Final
Vivastream
 
HCLT Brochure: E-Discovery and Document Review Solutions
HCL Technologies
 
2012.04.26 big insights streams im forum2
Wilfried Hoge
 
Le Cloud de proximité by Monaco Telecom et Interxion
Yannick Quentel
 
Striving for an Outstanding IT Organization
Huberto Garza
 
Big Data launch keynote Singapore Patrick Buddenbaum
IntelAPAC
 
Enterprise Security Architecture: From Access to Audit
Bob Rhubart
 
Ensuring Mobile BI Success
Birst
 
Identity Insights: Social, Local and Mobile Identity
Jon Bultmeyer
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
Stichting ePortfolio Support
 
Simplified Business Event Processing
Nigel Green
 
Sql server 2012 smart dive presentation 20120126
Andrew Mauch
 
Wed 1130 aasman_jans_color
DATAVERSITY
 
Ad

Viewers also liked (14)

PDF
The Next Generation of Big Data Analytics
Hortonworks
 
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
PDF
vBACD July 2012 - Apache Hadoop, Now and Beyond
CloudStack - Open Source Cloud Computing Project
 
PDF
啟程:Data Technology 的待客之道
Etu Solution
 
PDF
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
Etu Solution
 
PDF
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Etu Solution
 
PDF
那些你知道的,但還沒看過的 Big Data 風景
Etu Solution
 
PDF
資料科學團隊人才培育分享 ─ 以 DSP 為例
Fred Chiang
 
PDF
Summary of Insights Learned from the Data Science Program Team Training
Fred Chiang
 
PDF
轉兌數據的價值 — 從導購到策購
Fred Chiang
 
PPTX
資料價值 — 一位資料產品經理的視野
Fred Chiang
 
PDF
Big Data vs. Open Data
Fred Chiang
 
PDF
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
Fred Chiang
 
PDF
Big Data 現象,以及現象中的我們
Fred Chiang
 
The Next Generation of Big Data Analytics
Hortonworks
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
CloudStack - Open Source Cloud Computing Project
 
啟程:Data Technology 的待客之道
Etu Solution
 
台灣 Hadoop Big Data 2014 趨勢預測與企業策略藍圖
Etu Solution
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Etu Solution
 
那些你知道的,但還沒看過的 Big Data 風景
Etu Solution
 
資料科學團隊人才培育分享 ─ 以 DSP 為例
Fred Chiang
 
Summary of Insights Learned from the Data Science Program Team Training
Fred Chiang
 
轉兌數據的價值 — 從導購到策購
Fred Chiang
 
資料價值 — 一位資料產品經理的視野
Fred Chiang
 
Big Data vs. Open Data
Fred Chiang
 
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
Fred Chiang
 
Big Data 現象,以及現象中的我們
Fred Chiang
 
Ad

Similar to Hadoop's Opportunity to Power Next-Generation Architectures (20)

PDF
Powering Next Generation Data Architecture With Apache Hadoop
Hortonworks
 
PPTX
2012 06 hortonworks paris hug
Modern Data Stack France
 
PPTX
Introducing Splunk – The Big Data Engine
Swiss Big Data User Group
 
PDF
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
 
PDF
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
OW2
 
PDF
Unified big data architecture
DataWorks Summit
 
PDF
16h00 globant - aws globant-big-data_summit2012
infolive
 
PDF
Globant and Big Data on AWS
Amazon Web Services LATAM
 
PDF
Cutting Big Data Down to Size with AMD and Dell
AMD
 
PDF
Hortonworks roadshow
Accenture
 
PDF
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
 
PPTX
Introduction to Hortonworks Data Platform for Windows
Hortonworks
 
PDF
Experiences Streaming Analytics at Petabyte Scale
DataWorks Summit
 
PDF
Hadoop: What It Is and What It's Not
Inside Analysis
 
PPTX
Splunk Overview
Splunk
 
PPTX
Secure Big Data Analytics - Hadoop & Intel
Intel - API Security & Tokenization
 
PDF
Scaling MySQL: Catch 22 of Read Write Splitting
ScaleBase
 
PPTX
Tech4Africa - Opportunities around Big Data
Steve Watt
 
PDF
Intel Cloud Summit: Big Data
IntelAPAC
 
PDF
Building Big Data Applications
Richard McDougall
 
Powering Next Generation Data Architecture With Apache Hadoop
Hortonworks
 
2012 06 hortonworks paris hug
Modern Data Stack France
 
Introducing Splunk – The Big Data Engine
Swiss Big Data User Group
 
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
OW2
 
Unified big data architecture
DataWorks Summit
 
16h00 globant - aws globant-big-data_summit2012
infolive
 
Globant and Big Data on AWS
Amazon Web Services LATAM
 
Cutting Big Data Down to Size with AMD and Dell
AMD
 
Hortonworks roadshow
Accenture
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
 
Introduction to Hortonworks Data Platform for Windows
Hortonworks
 
Experiences Streaming Analytics at Petabyte Scale
DataWorks Summit
 
Hadoop: What It Is and What It's Not
Inside Analysis
 
Splunk Overview
Splunk
 
Secure Big Data Analytics - Hadoop & Intel
Intel - API Security & Tokenization
 
Scaling MySQL: Catch 22 of Read Write Splitting
ScaleBase
 
Tech4Africa - Opportunities around Big Data
Steve Watt
 
Intel Cloud Summit: Big Data
IntelAPAC
 
Building Big Data Applications
Richard McDougall
 

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Software Development Methodologies in 2025
KodekX
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
The Future of Artificial Intelligence (AI)
Mukul
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 

Hadoop's Opportunity to Power Next-Generation Architectures

  • 1. Hadoop’s Opportunity to Power Next-Generation Architectures Shaun Connolly, Hortonworks Strategy June 13, 2012
  • 2. How many people are lucky enough to say that they were at the forefront of something big?
  • 4. Big Data = Transactions + Interactions + Observations BIG DATA User Generated Content Sensors / RFID / Devices Petabytes Mobile Web Sentiment Social Interactions & Feeds User Click Stream Spatial & GPS Web logs WEB A/B testing Coordinates Terabytes External Offer history Dynamic Pricing Demographics Affiliate Networks Business Data Feeds CRM Gigabytes Segmentation Search Marketing HD Video, Audio, Images ERP Offer details Behavioral Speech to Text Purchase detail Targeting Megabytes Purchase record Customer Touches Product/Service Logs Dynamic Funnels Payment record SMS/MMS Support Contacts Increasing Data Variety and Complexity Source: Contents of above graphic created in partnership with Teradata, Inc.
  • 5. There is still work to be done to ensure HADOOP powers the BIG DATA WAVE
  • 6. Many Communities Must Work As One • Be diligent stewards of the open source core • Be tireless innovators Open Source beyond the core Vendors • Provide robust data platform services & open APIs • Enable ecosystem at each End Users layer of the stack • Make platform enterprise- ready & easy to use
  • 7. Top 10 Influencers of the Decade 1.  Google 2.  Apple 3.  Apache Software Foundation 4.  Microsoft 5.  Linux Foundation 6.  Eclipse Foundation 7.  Twitter 8.  Free Software Foundation 9.  Android Project 10. VMware Source: SD Times, http://www.sdtimes.com/link/36666
  • 8. Top 10 Influencers of the Decade #3 Source: SD Times, http://www.sdtimes.com/link/36666
  • 9. Diligent Stewards & Tireless Innovators Pig Avro Hive Cascading HBase Accumulo Zookeeper Whirr HCatalog Chukwa Ambari Snappy Sqoop Spark Oozie HAMA Giraph Flume OpenMPI Mahout 1.0 2.0 Beyond
  • 10. [Integrating Hadoop with existing IT investments is vitally important.] Larry Feinsmith
  • 11. Connecting Transactions + Interactions + Observations Audio, Retain runtime models and Video, Images historical data for ongoing 4 Business refinement & analysis Transactions Docs, Text, & Interactions XML Web Logs, Web, Mobile, CRM, Clicks ERP, SCM, … Big Data Social, Refinery Classic Graph, 3 Share refined data and 1 ETL Feeds runtime models processing Sensors, 2 Devices, RFID Store, aggregate, and transform multi-structured Business Spatial, data to unlock value Intelligence GPS & Analytics Retain historical data to Events, Other unlock additional value 5 Dashboards, Reports, Visualization, …
  • 12. Next-Generation Big Data Architecture Audio, Web, Mobile, CRM, Video, Images ERP, SCM, … Business Transactions Docs, Text, & Interactions XML Web Logs, Clicks Big Data Social, Refinery SQL NoSQL NewSQL Graph, Feeds EDW MPP NewSQL Sensors, Devices, RFID Arrows powered by Business Spatial, GPS ETL, data Intelligence movement, and data & Analytics integration Events, technologies Other Dashboards, Reports, Visualization, …
  • 13. Data Services & Open APIs are Vital Raw hadoop data Table access Inconsistent metadata Tool specific access HCatalog Aligned metadata RESTful API Apache HCatalog: Hadoop’s centralized metadata service ü  Provide consistent metadata and data models across tools ü  Share data as tables in and out of HDFS ü  Enable flexible, thin-client access via RESTful APIs
  • 14. Data Services & Open APIs In Action Analyze website visits by the 1 Web Log files via WebHDFS APIs 4 type of end results Website Web Interactions Logs Big Data Order Refinery DB Data Customer DB Data Customer & Order data via Talend Process, analyze, and join data 2 3 & HCatalog for schema via Talend, Pig, & HCatalog
  • 15. Let’s Head to the Demo Kitchen
  • 16. Ecosystem Completes the Puzzle Applications, Business Tools, & Dev Tools Data Management & Movement Infrastructure & Systems Management
  • 17. Solution Architectures: Make Hadoop Enterprise-Ready & Easy to Use Applications, Business Tools, & Dev Tools Data Management & Movement Infrastructure & Systems Management
  • 18. Our Opportunity…and Our Role By the end of 2015, more than half the world's data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable the ecosystem at each layer of the stack 5 Make the platform enterprise-ready & easy to use