SlideShare a Scribd company logo
Hadoop In the Enterprise?


      Sih Lee & Peter Krey, Innovation & Shared Services
             Firmwide Engineering & Architecture


             Hadoop World, New York City, October 2nd, 2009




                                                                           2009 JPMorgan Chase & Co.
                                                                               All rights reserved.
                                                              Confidential and proprietary to JPMorgan Chase & Co.
Agenda

                                                                      Page



                                      JPMorgan Chase + Open Source      2


                                      Hadoop In The Enterprise?         3


                                      Active POC Pipeline               6


                                      Hadoop Positioning                7


                                      Cost Comparisons                  8


                                      Hadoop Additions & Must Haves    10
Hadoop In The Enterprise ?




                                      Q&A                              11




                                                                             1
JPMorgan Chase + Open Source




                                  Established Multi-Year Open Source History

                                  Big Supporter of Industry Standards & Open Source Projects

                                  Numerous Production Open Source Implementations
                                   QPID (AMQP) - Top Level Apache Project (http://qpid.apache.org/)
                                   Tyger - Apache + Tomcat + Spring - Fully Integrated App
                                   Server Environment 30+ OS Components
                                   Compute Backbone (CBB) HPC Grid - 1000's of Linux Based Compute
Hadoop In The Enterprise ?




                                   Servers
                                   MuleSoft.org (a.k.a. MuleSource) Enterprise Message Bus
                                   others …




                                                                                                      2
Hadoop In The Enterprise – Economics Driven



                                  Many Big Data Lessons Learned From Web 2.0 Community

                                  Potential For Large Capex and Opex "Dislocation"
                                    Reduced Consumption of Enterprise Premium Resources
                                    Grid Computing Economics Brought To Data Intensive Computing
                                    Stagnant Data Innovation

                                  Enabling & Potentially Disruptive Platform
                                    Many Historical Similarities
                                      Java, Linux, Tomcat, Web / Internet, …
Hadoop In The Enterprise ?




                                      Mini's to Client / Server, Client / Server to Web, Solaris to Linux, …
                                    Key Question: What Can Be Built On Top of and Enabled by Hadoop?




                                                                                                               3
Hadoop In The Enterprise – Choice Driven




                                  Overuse of Relational Database Containers
                                    Institutional “Muscle Memory” … Not Much Else to Choose From
                                    Increasing Large Percentage of Static Data Stored In Proprietary
                                    Transactional DB's
                                    Over-Normalized Schemas … Still Makes Sense With Cheap
                                    Compute & Storage?


                                  Enterprise Storage "Prisoners"
Hadoop In The Enterprise ?




                                    Captive To The Economics & Technology of "A Few" Vendors
                                    Developers Need More Choice
                                    Too Much Proprietary, Single-Source Data Infrastructure
                                    Increasing Need For Minimal / No System + Storage Admins




                                                                                                       4
Hadoop In The Enterprise – Other Drivers




                                  Growing Developer Interest In "No SQL" Data Technologies
                                    Open Source, Distributed, Non-relational Databases
                                    Growing Influence Of Web 2.0 Technologies & Thinking On Enterprise
                                    Hadoop, Cassandra, HBase, Hive, CouchDB, HadoopDB, …, others
                                    memcached For Caching

                                  FSI Industry Drivers
                                    Increased Regulatory Oversight + Reporting =
Hadoop In The Enterprise ?




                                    More Data Needed Over Longer Period Of Time
                                    Growing Need For Less Expensive Data Repository / Store
                                    Increasing Need To Support "One Off" Analysis On Large Data




                                                                                                         5
Active POC Pipeline




                                 Growing Stream of Real Projects To Gauge Hadoop "Goodness of Fit"
                                 Broad Spectrum of Use Cases
                                 Driven By Need To Impact / Dislocate OPEX + CAPEX
                                 Evaluated On Metric Based Performance, Functional, And
                                 Economic Measures
Hadoop In The Enterprise ?




                                                                                                     6
Hadoop Positioning
                                                                                                                Semi-Structured
                                                                                                                   Analysis
                                                                   Higher-Latency

                                                                                                                         • Map/Reduce + HDFS
                                                                                                                • DW7

                                                                                           • DW6
                                                                                              • DW5



                                                                                                      • DW3
                                                              • SQLDB1                                • DW4
                                GB’s                                                                                              TB’s –> PB’s
Hadoop In The Enterprise ?




                                                                     • SQLDB2               • DW2

                                                                     • SQLDB3               • DW1
                                           • InMemory1                          • SQLDB4




                                       Index Based Access –                                                   Index Based Access –
                                         Updates / XActns                                                           Analysis
                                                                         Lower-Latency


                                                                                                                                                 7
Comparative Storage Cost Bar Graph Slide


                                  “Normalized" SAN + NAS $ per gb per month versus HDFS $ per gb per month
Hadoop In The Enterprise ?




                                                                                                        p


                                                                                                        p


                                                                                                        p


                                                                                                        p
                                    N


                                          N


                                                N


                                                       N


                                                              N


                                                                    N


                                                                          AS


                                                                                 AS


                                                                                        AS


                                                                                              AS


                                                                                                      oo


                                                                                                      oo


                                                                                                      oo


                                                                                                      oo
                                  SA


                                        SA


                                              SA


                                                     SA


                                                            SA


                                                                  SA


                                                                         N


                                                                                N


                                                                                       N


                                                                                             N


                                                                                                    ad


                                                                                                    ad


                                                                                                    ad


                                                                                                    ad
                                                                                                   H


                                                                                                   H


                                                                                                   H


                                                                                                   H
                                                                                                             8
Enterprise Data Warehousing Costs


                                  "normalized” bar chart utilizing retail $ per TB

                                                              Data Warehouse S/W -- $K per TB

                                    $250



                                    $200



                                    $150
Hadoop In The Enterprise ?




                                    $100



                                     $50



                                      $0
                                                                                 Products

                                                                                                9
Hadoop Additions & Must Haves




                                  Improved SQL Front-end Tool Interoperability
                                   Better Interop With Skills & Content That Firms Already Have
                                  Improved Security & ACL enforcement … Kerberos integration?
                                  Grow Developer Programming Model Skill Sets
                                  Improve Relational Container Integration & Interop For Data Archival
                                  Management & Monitoring Tools
                                  Improved Developer & Debugging Tools
Hadoop In The Enterprise ?




                                  Reduce Latency Via Integration With Open Source Data Caching
                                   memcached, others
                                  Invitation To FSI or Enterprise Roundtable




                                                                                                         10
Q&A




                                   Sih Lee, Head of Innovation & Shared Services
                                   Firmwide Engineering & Architecture
                                   W# 212-622-3038
                                   sih.x.lee@jpmchase.com


                                   Peter Krey, Consultant, Innovation & Shared Services
                                   Firmwide Engineering & Architecture
                                   W# 212-622-2926
                                   peter.j.krey@jpmchase.com
Hadoop In The Enterprise ?




                                                                                          11

More Related Content

What's hot (19)

PPTX
Hadoop Twelve Predictions for 2012
Cloudera, Inc.
 
PDF
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
PDF
Analyzing Big Data - Jeff Scheel
Kangaroot
 
PPTX
IT @ Intel: Preparing the Future Enterprise with the Internet of Things
Intel IT Center
 
PDF
Hadoop for shanghai dev meetup
Roby Chen
 
PPTX
Hadoop-as-a-Service for Lifecycle Management Simplicity
DataWorks Summit
 
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
PDF
S18
TH Schee
 
PDF
Using hadoop to expand data warehousing
DataWorks Summit
 
PDF
Building Big Data Applications
Richard McDougall
 
PDF
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Starttech Ventures
 
PDF
VMUGIT UC 2013 - 08a VMware Hadoop
VMUG IT
 
PDF
Hadoop in the Cloud
IBM Analytics
 
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
PPTX
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
ModusOptimum
 
PPTX
Driving Business Benefits with Hadoop
MapR Technologies
 
PPTX
Emulex Presents Why I/O is Strategic Global Survey Results
Emulex Corporation
 
PPTX
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Cloudera, Inc.
 
PDF
Impact of in-memory technology and SAP HANA on your business, IT, and career
Vitaliy Rudnytskiy
 
Hadoop Twelve Predictions for 2012
Cloudera, Inc.
 
Enterprise Apache Hadoop: State of the Union
Hortonworks
 
Analyzing Big Data - Jeff Scheel
Kangaroot
 
IT @ Intel: Preparing the Future Enterprise with the Internet of Things
Intel IT Center
 
Hadoop for shanghai dev meetup
Roby Chen
 
Hadoop-as-a-Service for Lifecycle Management Simplicity
DataWorks Summit
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Hortonworks
 
Using hadoop to expand data warehousing
DataWorks Summit
 
Building Big Data Applications
Richard McDougall
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Starttech Ventures
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUG IT
 
Hadoop in the Cloud
IBM Analytics
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
ModusOptimum
 
Driving Business Benefits with Hadoop
MapR Technologies
 
Emulex Presents Why I/O is Strategic Global Survey Results
Emulex Corporation
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Cloudera, Inc.
 
Impact of in-memory technology and SAP HANA on your business, IT, and career
Vitaliy Rudnytskiy
 

Viewers also liked (7)

PPT
Hw09 Monitoring Best Practices
Cloudera, Inc.
 
PPT
Hw09 Hadoop Applications At Yahoo!
Cloudera, Inc.
 
PPT
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
Cloudera, Inc.
 
PPT
Hw09 Large Scale Transaction Analysis
Cloudera, Inc.
 
PDF
ZooKeeper Futures
Cloudera, Inc.
 
PDF
Hw09 Optimizing Hadoop Deployments
Cloudera, Inc.
 
PPTX
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Cloudera, Inc.
 
Hw09 Monitoring Best Practices
Cloudera, Inc.
 
Hw09 Hadoop Applications At Yahoo!
Cloudera, Inc.
 
Hw09 Building Data Intensive Apps A Closer Look At Trending Topics.Org
Cloudera, Inc.
 
Hw09 Large Scale Transaction Analysis
Cloudera, Inc.
 
ZooKeeper Futures
Cloudera, Inc.
 
Hw09 Optimizing Hadoop Deployments
Cloudera, Inc.
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Cloudera, Inc.
 
Ad

Similar to Hw09 Data Processing In The Enterprise (20)

PDF
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
 
PDF
Hadoop Overview
EMC
 
PDF
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Richard McDougall
 
PDF
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu
 
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
PDF
Keynote from ApacheCon NA 2011
Hortonworks
 
PDF
Architecting Virtualized Infrastructure for Big Data
Richard McDougall
 
PDF
Hadoop Business Cases
Joey Jablonski
 
PPTX
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Cloudera, Inc.
 
PDF
Introduction to Gruter and Gruter's BigData Platform
Gruter
 
PPTX
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Cloudera, Inc.
 
PPT
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Cloudera, Inc.
 
PDF
Common and unique use cases for Apache Hadoop
Brock Noland
 
PDF
Commonanduniqueusecases 110831113310-phpapp01
eimhee
 
PPTX
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
PDF
Infrastructure Considerations for Analytical Workloads
Cognizant
 
PPT
Big Data = Big Decisions
InnoTech
 
PDF
Hadoop Overview by Sunitha Flowerhill
Sunitha Flowerhill
 
PDF
Hadoop Trends
Hortonworks
 
PPTX
Hadoop and WANdisco: The Future of Big Data
WANdisco Plc
 
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
 
Hadoop Overview
EMC
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Richard McDougall
 
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu
 
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Keynote from ApacheCon NA 2011
Hortonworks
 
Architecting Virtualized Infrastructure for Big Data
Richard McDougall
 
Hadoop Business Cases
Joey Jablonski
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Cloudera, Inc.
 
Introduction to Gruter and Gruter's BigData Platform
Gruter
 
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Cloudera, Inc.
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Cloudera, Inc.
 
Common and unique use cases for Apache Hadoop
Brock Noland
 
Commonanduniqueusecases 110831113310-phpapp01
eimhee
 
Apache Hadoop Now Next and Beyond
DataWorks Summit
 
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Big Data = Big Decisions
InnoTech
 
Hadoop Overview by Sunitha Flowerhill
Sunitha Flowerhill
 
Hadoop Trends
Hortonworks
 
Hadoop and WANdisco: The Future of Big Data
WANdisco Plc
 
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
PPTX
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
PPTX
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
PPTX
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
PPTX
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
PPTX
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
PPTX
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
PPTX
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
PPTX
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

Recently uploaded (20)

PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 

Hw09 Data Processing In The Enterprise

  • 1. Hadoop In the Enterprise? Sih Lee & Peter Krey, Innovation & Shared Services Firmwide Engineering & Architecture Hadoop World, New York City, October 2nd, 2009  2009 JPMorgan Chase & Co. All rights reserved. Confidential and proprietary to JPMorgan Chase & Co.
  • 2. Agenda Page JPMorgan Chase + Open Source 2 Hadoop In The Enterprise? 3 Active POC Pipeline 6 Hadoop Positioning 7 Cost Comparisons 8 Hadoop Additions & Must Haves 10 Hadoop In The Enterprise ? Q&A 11 1
  • 3. JPMorgan Chase + Open Source Established Multi-Year Open Source History Big Supporter of Industry Standards & Open Source Projects Numerous Production Open Source Implementations QPID (AMQP) - Top Level Apache Project (http://qpid.apache.org/) Tyger - Apache + Tomcat + Spring - Fully Integrated App Server Environment 30+ OS Components Compute Backbone (CBB) HPC Grid - 1000's of Linux Based Compute Hadoop In The Enterprise ? Servers MuleSoft.org (a.k.a. MuleSource) Enterprise Message Bus others … 2
  • 4. Hadoop In The Enterprise – Economics Driven Many Big Data Lessons Learned From Web 2.0 Community Potential For Large Capex and Opex "Dislocation" Reduced Consumption of Enterprise Premium Resources Grid Computing Economics Brought To Data Intensive Computing Stagnant Data Innovation Enabling & Potentially Disruptive Platform Many Historical Similarities Java, Linux, Tomcat, Web / Internet, … Hadoop In The Enterprise ? Mini's to Client / Server, Client / Server to Web, Solaris to Linux, … Key Question: What Can Be Built On Top of and Enabled by Hadoop? 3
  • 5. Hadoop In The Enterprise – Choice Driven Overuse of Relational Database Containers Institutional “Muscle Memory” … Not Much Else to Choose From Increasing Large Percentage of Static Data Stored In Proprietary Transactional DB's Over-Normalized Schemas … Still Makes Sense With Cheap Compute & Storage? Enterprise Storage "Prisoners" Hadoop In The Enterprise ? Captive To The Economics & Technology of "A Few" Vendors Developers Need More Choice Too Much Proprietary, Single-Source Data Infrastructure Increasing Need For Minimal / No System + Storage Admins 4
  • 6. Hadoop In The Enterprise – Other Drivers Growing Developer Interest In "No SQL" Data Technologies Open Source, Distributed, Non-relational Databases Growing Influence Of Web 2.0 Technologies & Thinking On Enterprise Hadoop, Cassandra, HBase, Hive, CouchDB, HadoopDB, …, others memcached For Caching FSI Industry Drivers Increased Regulatory Oversight + Reporting = Hadoop In The Enterprise ? More Data Needed Over Longer Period Of Time Growing Need For Less Expensive Data Repository / Store Increasing Need To Support "One Off" Analysis On Large Data 5
  • 7. Active POC Pipeline Growing Stream of Real Projects To Gauge Hadoop "Goodness of Fit" Broad Spectrum of Use Cases Driven By Need To Impact / Dislocate OPEX + CAPEX Evaluated On Metric Based Performance, Functional, And Economic Measures Hadoop In The Enterprise ? 6
  • 8. Hadoop Positioning Semi-Structured Analysis Higher-Latency • Map/Reduce + HDFS • DW7 • DW6 • DW5 • DW3 • SQLDB1 • DW4 GB’s TB’s –> PB’s Hadoop In The Enterprise ? • SQLDB2 • DW2 • SQLDB3 • DW1 • InMemory1 • SQLDB4 Index Based Access – Index Based Access – Updates / XActns Analysis Lower-Latency 7
  • 9. Comparative Storage Cost Bar Graph Slide “Normalized" SAN + NAS $ per gb per month versus HDFS $ per gb per month Hadoop In The Enterprise ? p p p p N N N N N N AS AS AS AS oo oo oo oo SA SA SA SA SA SA N N N N ad ad ad ad H H H H 8
  • 10. Enterprise Data Warehousing Costs "normalized” bar chart utilizing retail $ per TB Data Warehouse S/W -- $K per TB $250 $200 $150 Hadoop In The Enterprise ? $100 $50 $0 Products 9
  • 11. Hadoop Additions & Must Haves Improved SQL Front-end Tool Interoperability Better Interop With Skills & Content That Firms Already Have Improved Security & ACL enforcement … Kerberos integration? Grow Developer Programming Model Skill Sets Improve Relational Container Integration & Interop For Data Archival Management & Monitoring Tools Improved Developer & Debugging Tools Hadoop In The Enterprise ? Reduce Latency Via Integration With Open Source Data Caching memcached, others Invitation To FSI or Enterprise Roundtable 10
  • 12. Q&A Sih Lee, Head of Innovation & Shared Services Firmwide Engineering & Architecture W# 212-622-3038 [email protected] Peter Krey, Consultant, Innovation & Shared Services Firmwide Engineering & Architecture W# 212-622-2926 [email protected] Hadoop In The Enterprise ? 11