SlideShare a Scribd company logo
INTRODUCTION TO
THE HADOOP
ECOSYSTEM
BAKING A LAYER CAKE AND BEYOND…
“Qu’ils mangent de la
brioche.”
1
BEFORE WE BEGIN
Questions for the
audience….
How Many of You
have :
Been working with Hadoop for more than 3
months?
Been working with Hadoop for more than 6
months?
Been working with Hadoop for more than 1
year?How many of you have heard about this thing
called
‘Hadoop’ / ‘Big Data’ and thought it would be fun
to check it out?
About the Speaker
BSCIS - The College of Engineering, The Ohio State University
‘Big Data’ Consultant with > 25 years in IT
Working solely in the ‘Big Data’ space since 2009
Founded Chicago area Hadoop User Group (CHUG) in April 2010
1600+ Members
Over 200 different companies across all industries in the Chicagoland area.
Routinely has talked at different Conferences around the US on Hadoop.
Guest Lecture at Illinois Institute of Technology.
CoAuthored papers found on InfoQ.
MapR Admin, Cloudera Admin & Developer Certified.
3
email: MSegel (at)
segel.com
Skype: Michael_Segel
What is Hadoop?
‘A Framework of software tools to allow one to take a
large problem and process individual pieces in
parallel. ‘
4
Our Hadoop Layer Cake:
Circa 2010
Storag
e
Job
Control
Data Access
5
Programmin
g
Languages
Data Access
Our Hadoop Layer Cake:
Circa 2013 Hadoop 2.0
Storag
e
Job
Control
6
Resourc
e
Control
Real
Time
Messag
es
Confused?
This is just the tip of the
iceberg.
Data
Frameworks
The only constant is
change…
Hadoop is a disruptive technology, forcing the enterprise
to rethink how it handles data.
The core Apache Framework is just the starting point.
Disruption allows new vendors to compete with
established vendors.
If you can build a better mousetrap, you will attract
customers.
Hadoop plays nice with others…
PROPRIETARY SOFTWARE IS BAD.
“Qu’ils mangent de la
brioche.”
8
‘Let them eat
cake’
Myth
:
Reality
:VENDOR LOCK IN IS BAD.
HADOOP IS ONLY GOOD FOR BATCH
PROCESSING
“Qu’ils mangent de la
brioche.”
9
‘Let them eat
cake’
Myth
:
Reality
:HADOOP CAN ALSO BE USED FOR ‘REAL TIME’
PROBLEMS.
[CENSOR
ED]
PROJE
CT
DAT
E
CLIE
NT
REAL TIME HADOOP
SINGLE DATA CENTER SOLUTION
Nightly Batch Jobs Create the
Next Days Advertising Lists
Client Phone Connects to the web
serviceWeb Service talks to Ad
EnginePhone connects to Ad Engine to
get Ad
Ad Engine connects to HBase to
get list of potential Ads to display,
sending the correct Ad to phone.
HADOOP IS A STAND ALONE SYSTEM AND WILL REPLACE
TRADITIONAL VENDOR’S PRODUCTS
“Qu’ils mangent de la
brioche.”
11
‘Let them eat
cake’
Myth
:
Reality
:HADOOP IS PART OF THE ENTERPRISE . IT CAN BE
STANDALONE, OR IT CAN WORK WITH EXISTING
INFRASTRUCTURE.
PROJE
CT
DAT
E
CLIE
NT
TOD
AY
HADOOP AND THE
ENTERPRISE
WE CAN ALL GET ALONG….
Hadoop communicates
well with the rest of the
Enterprise…
Central cluster feeds
distributed web services
with local database
backing…
[split in to two
slides]
PROJE
CT
DAT
E
CLIE
NT
TOD
AY
HADOOP AND THE
ENTERPRISE
WE CAN ALL GET ALONG….
Hadoop communicates
well with the rest of the
Enterprise…
Traditional Data
Stores play nice with
Hadoop. Some seeing
HDFS files as external
tables.
[split in to two
slides]
How Traditional Vendors view
Hadoop
In the beginning they saw Hadoop as a threat.
They will crush them.
If you can’t beat them, join them….
Oracle Partners with Cloudera
EMC partnered with MapR, then released its own distribution. (Green Stack)
Terradata partners with Hortonworks.
Microsoft partnered with Hortonworks.
Intel
Tried to create their own distro.
Last week, dumped their distro, made large investment in to Cloudera.
IBM … Has its own distro, yet certifies their tools to run on Cloudera
Cisco partners with MapR
Amazon (AWS) has own distro, Partners with MapR.
HADOOP CLUSTERS SHOULD BE BUILT ON COMMODITY
HARDWARE .
“Qu’ils mangent de la
brioche.”
15
‘Let them eat
cake’
Myth
:
Reality
:YOU CAN DESIGN YOUR CLUSTER AROUND
CONSTRAINTS…
PROJE
CT
DAT
E
CLIE
NT
ALTERNATIVE CLUSTER
LAYOUT
STORAGE / COMPUTE CLUSTER
A Higher Density of Disk
and Compute Cluster
Premium over
Commodity Hardware
I/O Latency
Could be part of a
virtualization solution.
HADOOP HADOOP IS OPEN SOURCE AND
THEREFORE FREE.
“Qu’ils mangent de la
brioche.”
17
‘Let them eat cake’
Myth
:
Reality
:T.A.N.S.T.A.A.F.L ‘TANS - TAH - FELL’
(THERE AINT NO SUCH THING AS A FREE LUNCH )
There aint no such thing as a free
lunch…
Customers are paying for support.
Tools are primitive, requires work, no real point and click
solution in place, but getting there.
Hadoop fills the gap where you want a custom solution.
Merging semi-structured and structured data is going to be
data dependent, requiring customization.
Beyond ETL, SQL, custom apps require developer
expertise. (You must invest in skills. )
Depending on Use Case, Time to Value (TtV) will differ.
Bottom Line, there is a cost reduction over traditional
solutions, but its not free.
Take away…
Hadoop is a tool set that is constantly evolving.
Beware of marketing myths…
Do your own homework and talk to the vendors.
Make them earn your business.
T.A.S.T.A.A.F.L applies, you need to make an investment in terms of skills.
Hadoop isn’t a separate solution and should be part of your overall
Enterprise strategy.
Hadoop isn’t a silver bullet. By itself, it doesn’t solve your business
problems.
YOU CAN HAVE YOUR
CAKE AND EAT IT TOO!
QUESTIONS?
Thank You For Your
Time
What is a layer cake?
layer cake
noun [C] US
: two or more soft cakes put on top of each other with
jam, cream, icing, etc. (= a sweet mixture made from
sugar) between the cakes and covering the top and
sides
: a term for a diagram showing how various
parts of a group of components tie together
in terms of a functional stack.
22
What is Hadoop?
Storage Layer
The Storage Layer is a Distributed File System that
accomplishes the following:
Uniform Access from any machine in the cluster.
Fast Access (
Resiliency (Self Healing)
Redundancy (Replication)
This is known as HDFS - Hadoop File System
What is Hadoop?
Job Control Layer
The Job Control Layer is the layer that accomplishes the following:
Manages and Schedules Jobs to be run. (Default [FIFO],
Capacity Scheduler,
Manages the over all job, and distributes the subprocesses
across the cluster.
Manages the subprocesses being run on each node in the
cluster.
This is accomplished by a Job Tracker (Cluster level) and Task
Tracker (Node Level)
What is Hadoop?
Data Access Layer
The Data Access Layer is the layer that accomplishes the
following:
Allows for a higher level access which can be
translated to a Map/Reduce Job
Pig (Yahoo!)
Hive (Facebook)
Allows for Adhoc access to data outside of the
Map/Reduce Framework (HBase)
What is Hadoop?
Job Flow Control Layer
The Data Access Layer is the layer that accomplishes the following:
Allows for a higher level access which can be translated to a
Map/Reduce Job
Pig (Yahoo!)
Hive (Facebook)
Allows for Adhoc access to data outside of the Map/Reduce
Framework (HBase)
Allows for processes to be chained together to create a work
flow (Oozie)*
*No where else to put it…
List of Apache Incubator
Projects associated with
Hadoop:
Storm
Accumulo
Knox
Sentry
Falcon
DataFu
Drill
Tez
Twill
Phoenix
Hadoop Dev Tools
Tajo

More Related Content

PPTX
BIG Data & Hadoop Applications in Finance
Skillspeed
 
PDF
Big Data Analytics for Banking, a Point of View
Pietro Leo
 
PDF
Understanding Big Data
Capgemini
 
PDF
Big Data: Real-life Examples of Business Value Generation
Capgemini
 
PPTX
Big data analytics in banking sector
Anil Rana
 
PPTX
Welcome to the Age of Big Data in Banking
Andy Hirst
 
PDF
Big data & analytics for banking new york lars hamberg
Lars Hamberg
 
PPTX
Big Data Case study - caixa bank
Chungsik Yun
 
BIG Data & Hadoop Applications in Finance
Skillspeed
 
Big Data Analytics for Banking, a Point of View
Pietro Leo
 
Understanding Big Data
Capgemini
 
Big Data: Real-life Examples of Business Value Generation
Capgemini
 
Big data analytics in banking sector
Anil Rana
 
Welcome to the Age of Big Data in Banking
Andy Hirst
 
Big data & analytics for banking new york lars hamberg
Lars Hamberg
 
Big Data Case study - caixa bank
Chungsik Yun
 

What's hot (20)

PPTX
Creating $100 million from Big Data Analytics in Banking
Guy Pearce
 
PDF
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
Datameer
 
PDF
Pres_Big Data for Finance_vsaini
Vandana Saini (Vinnie)
 
PDF
Extending BI with Big Data Analytics
Datameer
 
PPTX
Using Big Data in Finance by Jonah Engler
Jonah Engler
 
PPTX
Eric van tol
BigDataExpo
 
PDF
Best Practices In Predictive Analytics
Capgemini
 
PDF
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
PPTX
Future and scope of big data analytics in Digital Finance and banking.
VIJAYAKUMAR P
 
PDF
Analytics in banking preview deck - june 2013
Everest Group
 
PDF
Big Data LDN 2018: DATA SCIENCE AT ING
Matt Stubbs
 
PPTX
Bmc joe goldberg
BigDataExpo
 
PPTX
Big Data
Kiran Jamil
 
PDF
AI & ML for Supply Chain Optimization
ShiSh Shridhar
 
PPTX
Customer Experience: A Catalyst for Digital Transformation
Cloudera, Inc.
 
PPTX
Cox Automotive: data sells cars
Cloudera, Inc.
 
PPTX
Tiger graph 2021 corporate overview [read only]
ercan5
 
PDF
Big data analytic market opportunity
Stanley Wang
 
PPTX
How advanced analytics is impacting the banking sector
Michael Haddad
 
PDF
Big Data & Analytics perspectives in Banking
Gianpaolo Zampol
 
Creating $100 million from Big Data Analytics in Banking
Guy Pearce
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
Datameer
 
Pres_Big Data for Finance_vsaini
Vandana Saini (Vinnie)
 
Extending BI with Big Data Analytics
Datameer
 
Using Big Data in Finance by Jonah Engler
Jonah Engler
 
Eric van tol
BigDataExpo
 
Best Practices In Predictive Analytics
Capgemini
 
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
Future and scope of big data analytics in Digital Finance and banking.
VIJAYAKUMAR P
 
Analytics in banking preview deck - june 2013
Everest Group
 
Big Data LDN 2018: DATA SCIENCE AT ING
Matt Stubbs
 
Bmc joe goldberg
BigDataExpo
 
Big Data
Kiran Jamil
 
AI & ML for Supply Chain Optimization
ShiSh Shridhar
 
Customer Experience: A Catalyst for Digital Transformation
Cloudera, Inc.
 
Cox Automotive: data sells cars
Cloudera, Inc.
 
Tiger graph 2021 corporate overview [read only]
ercan5
 
Big data analytic market opportunity
Stanley Wang
 
How advanced analytics is impacting the banking sector
Michael Haddad
 
Big Data & Analytics perspectives in Banking
Gianpaolo Zampol
 
Ad

Similar to Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel (20)

PDF
00 hadoop welcome_transcript
Guru Janbheshver University, Hisar
 
ODP
BigData primer
Morten Egan
 
PPTX
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Josh Patterson
 
PPT
Hadoop at Yahoo! -- University Talks
yhadoop
 
DOCX
1. what is hadoop part 1
wintersnow181189
 
ODP
Hadoop demo ppt
Phil Young
 
PPTX
A Glimpse of Bigdata - Introduction
saisreealekhya
 
PPTX
Hadoop
Oded Rotter
 
PPTX
Large Scale Data With Hadoop
guest27e6764
 
PDF
Analyst Report : The Enterprise Use of Hadoop
EMC
 
PDF
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
 
PPTX
HadoopWorkshopJuly2014
Dieter De Witte
 
PPTX
Oct 2011 CHADNUG Presentation on Hadoop
Josh Patterson
 
DOCX
Big data and Hadoop overview
Nitesh Ghosh
 
PDF
lec3_ref.pdf
vishal choudhary
 
PDF
ETL using Big Data Talend
Edureka!
 
PDF
Introduction to Big Data
Kristof Jozsa
 
PDF
How to build and run a big data platform in the 21st century
Ali Dasdan
 
PDF
Hadoop for Finance - sample chapter
Rajiv Tiwari
 
PDF
Hadoop explained [e book]
Supratim Ray
 
00 hadoop welcome_transcript
Guru Janbheshver University, Hisar
 
BigData primer
Morten Egan
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Josh Patterson
 
Hadoop at Yahoo! -- University Talks
yhadoop
 
1. what is hadoop part 1
wintersnow181189
 
Hadoop demo ppt
Phil Young
 
A Glimpse of Bigdata - Introduction
saisreealekhya
 
Hadoop
Oded Rotter
 
Large Scale Data With Hadoop
guest27e6764
 
Analyst Report : The Enterprise Use of Hadoop
EMC
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera, Inc.
 
HadoopWorkshopJuly2014
Dieter De Witte
 
Oct 2011 CHADNUG Presentation on Hadoop
Josh Patterson
 
Big data and Hadoop overview
Nitesh Ghosh
 
lec3_ref.pdf
vishal choudhary
 
ETL using Big Data Talend
Edureka!
 
Introduction to Big Data
Kristof Jozsa
 
How to build and run a big data platform in the 21st century
Ali Dasdan
 
Hadoop for Finance - sample chapter
Rajiv Tiwari
 
Hadoop explained [e book]
Supratim Ray
 
Ad

Recently uploaded (20)

PDF
Top Hospital CEOs in Asia 2025 - by Hospital Asia Management Journal
Gorman Bain Capital
 
PPTX
Mastering-Full-Stack-Web-Development-An-NIELIT-Perspective.pptx
VedprakashArya13
 
PDF
Why Most People Misunderstand Risk in Personal Finance.
Harsh Mishra
 
PDF
Asia’s Top 10 Hospital CEOs Transforming Healthcare in 2025
Gorman Bain Capital
 
PPTX
PUrposive-commmunicatuon112uospptxyynsns
yunaselle7
 
PDF
LM Curve Deri IS-LM Framework sess 10.pdf
mrigankjain19
 
PPTX
Econometrics - Introduction and Fundamentals.pptx
skillcipetcsn
 
PDF
PROBABLE ECONOMIC SHOCKWAVES APPROACHING: HOW BAYER'S GLYPHOSATE EXIT IN THE ...
Srivaanchi Nathan
 
PDF
[Cameron] Robust Inference for Regression with Clustered Data - slides (2015)...
soarnagi1
 
PPTX
Judaism-group-1.pptx for reporting grade 11
ayselprettysomuch
 
PPTX
Principles of Management buisness sti.pptx
CarToonMaNia5
 
PPTX
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
PPTX
H1 2025 review - a review of our trade recommendations for H1 2025
Mathias Lascar
 
PDF
2025 Mid-year Budget Review_SPEECH_FINAL_23ndJuly2025_v5.pdf
JeorgeWilsonKingson1
 
PPTX
LongTermDiscountRates_PensionPlaypen_JonSpain_22Jul2025_NotPW.pptx
Henry Tapper
 
PDF
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
PPT
Time Value of Money_Fundamentals of Financial Management
nafisa791613
 
PPTX
Maintenance_of_Genetic_Purity_of_Seed.pptx
prasadbishnu190
 
PDF
Stormy Decade - A Ten-Year Retrospective on the Ukrainian Investment Landscape
Ukrainian Venture Capital and Private Equity Association
 
Top Hospital CEOs in Asia 2025 - by Hospital Asia Management Journal
Gorman Bain Capital
 
Mastering-Full-Stack-Web-Development-An-NIELIT-Perspective.pptx
VedprakashArya13
 
Why Most People Misunderstand Risk in Personal Finance.
Harsh Mishra
 
Asia’s Top 10 Hospital CEOs Transforming Healthcare in 2025
Gorman Bain Capital
 
PUrposive-commmunicatuon112uospptxyynsns
yunaselle7
 
LM Curve Deri IS-LM Framework sess 10.pdf
mrigankjain19
 
Econometrics - Introduction and Fundamentals.pptx
skillcipetcsn
 
PROBABLE ECONOMIC SHOCKWAVES APPROACHING: HOW BAYER'S GLYPHOSATE EXIT IN THE ...
Srivaanchi Nathan
 
[Cameron] Robust Inference for Regression with Clustered Data - slides (2015)...
soarnagi1
 
Judaism-group-1.pptx for reporting grade 11
ayselprettysomuch
 
Principles of Management buisness sti.pptx
CarToonMaNia5
 
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
H1 2025 review - a review of our trade recommendations for H1 2025
Mathias Lascar
 
2025 Mid-year Budget Review_SPEECH_FINAL_23ndJuly2025_v5.pdf
JeorgeWilsonKingson1
 
LongTermDiscountRates_PensionPlaypen_JonSpain_22Jul2025_NotPW.pptx
Henry Tapper
 
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
Time Value of Money_Fundamentals of Financial Management
nafisa791613
 
Maintenance_of_Genetic_Purity_of_Seed.pptx
prasadbishnu190
 
Stormy Decade - A Ten-Year Retrospective on the Ukrainian Investment Landscape
Ukrainian Venture Capital and Private Equity Association
 

Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel

  • 1. INTRODUCTION TO THE HADOOP ECOSYSTEM BAKING A LAYER CAKE AND BEYOND… “Qu’ils mangent de la brioche.” 1
  • 2. BEFORE WE BEGIN Questions for the audience…. How Many of You have : Been working with Hadoop for more than 3 months? Been working with Hadoop for more than 6 months? Been working with Hadoop for more than 1 year?How many of you have heard about this thing called ‘Hadoop’ / ‘Big Data’ and thought it would be fun to check it out?
  • 3. About the Speaker BSCIS - The College of Engineering, The Ohio State University ‘Big Data’ Consultant with > 25 years in IT Working solely in the ‘Big Data’ space since 2009 Founded Chicago area Hadoop User Group (CHUG) in April 2010 1600+ Members Over 200 different companies across all industries in the Chicagoland area. Routinely has talked at different Conferences around the US on Hadoop. Guest Lecture at Illinois Institute of Technology. CoAuthored papers found on InfoQ. MapR Admin, Cloudera Admin & Developer Certified. 3 email: MSegel (at) segel.com Skype: Michael_Segel
  • 4. What is Hadoop? ‘A Framework of software tools to allow one to take a large problem and process individual pieces in parallel. ‘ 4
  • 5. Our Hadoop Layer Cake: Circa 2010 Storag e Job Control Data Access 5 Programmin g Languages
  • 6. Data Access Our Hadoop Layer Cake: Circa 2013 Hadoop 2.0 Storag e Job Control 6 Resourc e Control Real Time Messag es Confused? This is just the tip of the iceberg. Data Frameworks
  • 7. The only constant is change… Hadoop is a disruptive technology, forcing the enterprise to rethink how it handles data. The core Apache Framework is just the starting point. Disruption allows new vendors to compete with established vendors. If you can build a better mousetrap, you will attract customers. Hadoop plays nice with others…
  • 8. PROPRIETARY SOFTWARE IS BAD. “Qu’ils mangent de la brioche.” 8 ‘Let them eat cake’ Myth : Reality :VENDOR LOCK IN IS BAD.
  • 9. HADOOP IS ONLY GOOD FOR BATCH PROCESSING “Qu’ils mangent de la brioche.” 9 ‘Let them eat cake’ Myth : Reality :HADOOP CAN ALSO BE USED FOR ‘REAL TIME’ PROBLEMS.
  • 10. [CENSOR ED] PROJE CT DAT E CLIE NT REAL TIME HADOOP SINGLE DATA CENTER SOLUTION Nightly Batch Jobs Create the Next Days Advertising Lists Client Phone Connects to the web serviceWeb Service talks to Ad EnginePhone connects to Ad Engine to get Ad Ad Engine connects to HBase to get list of potential Ads to display, sending the correct Ad to phone.
  • 11. HADOOP IS A STAND ALONE SYSTEM AND WILL REPLACE TRADITIONAL VENDOR’S PRODUCTS “Qu’ils mangent de la brioche.” 11 ‘Let them eat cake’ Myth : Reality :HADOOP IS PART OF THE ENTERPRISE . IT CAN BE STANDALONE, OR IT CAN WORK WITH EXISTING INFRASTRUCTURE.
  • 12. PROJE CT DAT E CLIE NT TOD AY HADOOP AND THE ENTERPRISE WE CAN ALL GET ALONG…. Hadoop communicates well with the rest of the Enterprise… Central cluster feeds distributed web services with local database backing… [split in to two slides]
  • 13. PROJE CT DAT E CLIE NT TOD AY HADOOP AND THE ENTERPRISE WE CAN ALL GET ALONG…. Hadoop communicates well with the rest of the Enterprise… Traditional Data Stores play nice with Hadoop. Some seeing HDFS files as external tables. [split in to two slides]
  • 14. How Traditional Vendors view Hadoop In the beginning they saw Hadoop as a threat. They will crush them. If you can’t beat them, join them…. Oracle Partners with Cloudera EMC partnered with MapR, then released its own distribution. (Green Stack) Terradata partners with Hortonworks. Microsoft partnered with Hortonworks. Intel Tried to create their own distro. Last week, dumped their distro, made large investment in to Cloudera. IBM … Has its own distro, yet certifies their tools to run on Cloudera Cisco partners with MapR Amazon (AWS) has own distro, Partners with MapR.
  • 15. HADOOP CLUSTERS SHOULD BE BUILT ON COMMODITY HARDWARE . “Qu’ils mangent de la brioche.” 15 ‘Let them eat cake’ Myth : Reality :YOU CAN DESIGN YOUR CLUSTER AROUND CONSTRAINTS…
  • 16. PROJE CT DAT E CLIE NT ALTERNATIVE CLUSTER LAYOUT STORAGE / COMPUTE CLUSTER A Higher Density of Disk and Compute Cluster Premium over Commodity Hardware I/O Latency Could be part of a virtualization solution.
  • 17. HADOOP HADOOP IS OPEN SOURCE AND THEREFORE FREE. “Qu’ils mangent de la brioche.” 17 ‘Let them eat cake’ Myth : Reality :T.A.N.S.T.A.A.F.L ‘TANS - TAH - FELL’ (THERE AINT NO SUCH THING AS A FREE LUNCH )
  • 18. There aint no such thing as a free lunch… Customers are paying for support. Tools are primitive, requires work, no real point and click solution in place, but getting there. Hadoop fills the gap where you want a custom solution. Merging semi-structured and structured data is going to be data dependent, requiring customization. Beyond ETL, SQL, custom apps require developer expertise. (You must invest in skills. ) Depending on Use Case, Time to Value (TtV) will differ. Bottom Line, there is a cost reduction over traditional solutions, but its not free.
  • 19. Take away… Hadoop is a tool set that is constantly evolving. Beware of marketing myths… Do your own homework and talk to the vendors. Make them earn your business. T.A.S.T.A.A.F.L applies, you need to make an investment in terms of skills. Hadoop isn’t a separate solution and should be part of your overall Enterprise strategy. Hadoop isn’t a silver bullet. By itself, it doesn’t solve your business problems.
  • 20. YOU CAN HAVE YOUR CAKE AND EAT IT TOO!
  • 22. What is a layer cake? layer cake noun [C] US : two or more soft cakes put on top of each other with jam, cream, icing, etc. (= a sweet mixture made from sugar) between the cakes and covering the top and sides : a term for a diagram showing how various parts of a group of components tie together in terms of a functional stack. 22
  • 23. What is Hadoop? Storage Layer The Storage Layer is a Distributed File System that accomplishes the following: Uniform Access from any machine in the cluster. Fast Access ( Resiliency (Self Healing) Redundancy (Replication) This is known as HDFS - Hadoop File System
  • 24. What is Hadoop? Job Control Layer The Job Control Layer is the layer that accomplishes the following: Manages and Schedules Jobs to be run. (Default [FIFO], Capacity Scheduler, Manages the over all job, and distributes the subprocesses across the cluster. Manages the subprocesses being run on each node in the cluster. This is accomplished by a Job Tracker (Cluster level) and Task Tracker (Node Level)
  • 25. What is Hadoop? Data Access Layer The Data Access Layer is the layer that accomplishes the following: Allows for a higher level access which can be translated to a Map/Reduce Job Pig (Yahoo!) Hive (Facebook) Allows for Adhoc access to data outside of the Map/Reduce Framework (HBase)
  • 26. What is Hadoop? Job Flow Control Layer The Data Access Layer is the layer that accomplishes the following: Allows for a higher level access which can be translated to a Map/Reduce Job Pig (Yahoo!) Hive (Facebook) Allows for Adhoc access to data outside of the Map/Reduce Framework (HBase) Allows for processes to be chained together to create a work flow (Oozie)* *No where else to put it…
  • 27. List of Apache Incubator Projects associated with Hadoop: Storm Accumulo Knox Sentry Falcon DataFu Drill Tez Twill Phoenix Hadoop Dev Tools Tajo