SlideShare a Scribd company logo
Hadoop, a distributed framework for
Big Data
Presented By:
Bhushan Kulkarni
T.E(I.T)
Contents
1. Introduction and Hadoop’s history
2. Architecture in detail
3. Hadoop in industry
What is Hadoop?
â€ĸ Apache top level project, open-source
implementation of frameworks for reliable,
scalable, distributed computing and data storage.
â€ĸ It is a flexible and highly-available architecture for
large scale computation and data processing on a
network of commodity hardware.
What is Hadoop
â€ĸ Hadoop is a software framework for distributed processing of
large datasets across large clusters of computers
â€ĸ Large datasets  Terabytes or petabytes of data
â€ĸ Large clusters  hundreds or thousands of nodes
â€ĸ Hadoop is open-source implementation for Google
MapReduce
â€ĸ Hadoop is based on a simple programming model called
MapReduce
â€ĸ Hadoop is based on a simple data model, any data will fit
4
Brief History of Hadoop
â€ĸ Google introduced Map reduce Algorithm.
â€ĸ Doug Cutting and team took the solution
provided by Google and started an Open Source
Project called HADOOP in 2005 and Doug
named it after his son's toy elephant.
Hadoop’s Developers
Doug Cutting
2005: Doug Cutting and Michael J. Cafarella developed
Hadoop to support distribution for the Nutch search
engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to Apache
Software Foundation.
Large-Scale Data
Analytics
â€ĸ MapReduce computing paradigm (E.g., Hadoop) vs. Traditional
database systems
7
Database
vs.
ī‚— Many enterprises are turning to Hadoop
ī‚Ą Especially applications generating big data
ī‚Ą Web applications, social networks, scientific applications
Why Hadoop is able to compete?
8
Scalability (petabytes of data,
thousands of machines)
Database
vs.
Flexibility in accepting all data
formats (no schema)
Commodity inexpensive hardware
Efficient and simple fault-
tolerant mechanism
Performance (tons of indexing,
tuning, data organization tech.)
Structured Data
Key Components
â€ĸ Hadoop framework consists on two main layers
â€ĸ Distributed file system (HDFS)
â€ĸ Execution engine (MapReduce)
9
Hadoop: How it Works
10
Hadoop Architecture
11
Master node (single node)
Many slave nodes
â€ĸ Distributed file system (HDFS)
â€ĸ Execution engine (MapReduce)
Hadoop Distributed File System
(HDFS)
12
Centralized namenode
- Maintains metadata info about files
Many datanode (1000s)
- Store the actual data
- Files are divided into blocks
- Each block is replicated N times
(Default = 3)
File F
Blocks (64 MB)
Main Properties of HDFS
â€ĸ Large: A HDFS instance may consist of thousands of server
machines, each storing part of the file system’s data
â€ĸ Replication: Each data block is replicated many times
(default is 3)
â€ĸ Failure: Failure occurs rarely
â€ĸ Fault Tolerance: Detection of faults and quick, automatic
recovery from them is a core architectural goal of HDFS
â€ĸ Namenode is consistently checking Datanodes
13
What is MapReduce?
â€ĸ MapReduce is a programming model
â€ĸ Programs written in this functional style are automatically
parallelized and executed on a large cluster of commodity
machines
â€ĸ MapReduce is an associated implementation for processing
and generating large data sets.
MapReduce
MAP
map function that
processes a key/value
pair to generate a set of
intermediate key/value
pairs
REDUCE
and a reduce function
that merges all
intermediate values
associated with the
same intermediate key.
Properties of MapReduce Engine
â€ĸ Job Tracker is the master node (runs with the
namenode)
â€ĸ Receives the user’s job
â€ĸ Decides on how many tasks will run (number of mappers)
â€ĸ Decides on where to run each mapper (concept of locality)
15
â€ĸ This file has 5 Blocks  run 5 map tasks
â€ĸ Where to run the task reading block “1”
â€ĸ Try to run it on Node 1 or Node 3
Node 1 Node 2 Node 3
Properties of MapReduce Engine
(Cont’d)
â€ĸ Task Tracker is the slave node (runs on each datanode)
â€ĸ Receives the task from Job Tracker
â€ĸ Runs the task until completion (either map or reduce task)
â€ĸ Always in communication with the Job Tracker reporting progress
16
The Programming Model Of MapReduce
Map takes an input pair and produces a set of intermediate key/value pairs. The
MapReduce library groups together all intermediate values associated with the
same intermediate key.
The Reduce function, also written by the user, accepts an intermediate key I and a set of
values for that key. It merges together these values to form a possibly smaller set of values
MapReduce data flow with a single reduce task
MapReduce data flow with multiple reduce tasks
MapReduce data flow with no reduce tasks
Example 1 : Color Count
21
Shuffle & Sorting
based on k
Reduce
Reduce
Reduce
Map
Map
Map
Map
Input blocks
on HDFS
Produces (k, v)
( , 1)
Parse-hash
Parse-hash
Parse-hash
Parse-hash
Consumes(k, [v])
( , [1,1,1,1,1,1..])
Produces(k’, v’)
( , 100)
Job: Count the number of each color in a data set
Part0003
Part0002
Part0001
That’s the output file, it
has 3 parts on probably 3
different machines
Example 2: Color Filter
22
Job: Select only the blue and the green colors
Input blocks
on HDFS
Map
Map
Map
Map
Produces (k, v)
( , 1)
Write to HDFS
Write to HDFS
Write to HDFS
Write to HDFS
â€ĸ Each map task will select only
the blue or green colors
â€ĸ No need for reduce phase
Part0001
Part0002
Part0003
Part0004
That’s the output file, it
has 4 parts on probably 4
different machines
Why use Hadoop?
â€ĸ Need to process Multi Petabyte Datasets
â€ĸ Data may not have strict schema
â€ĸ Expensive to build reliability in each application
â€ĸ Need common infrastructure
â€ĸ Very Large Distributed File System
â€ĸ Assumes Commodity Hardware
â€ĸ Optimized for Batch Processing
Who Uses MapReduce/Hadoop
â€ĸ Google: Inventors of MapReduce computing paradigm
â€ĸ Yahoo: Developing Hadoop open-source of MapReduce
â€ĸ IBM, Microsoft, Oracle
â€ĸ Facebook, Amazon, AOL, NetFlex
â€ĸ Many others + universities and research labs
24
THANK YOU!!
25

More Related Content

PPT
Hadoop institutes-in-bangalore
Kelly Technologies
 
PPTX
Hadoop: A distributed framework for Big Data
Dhanashri Yadav
 
PPT
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
PPTX
Analysing of big data using map reduce
Paladion Networks
 
PDF
Big data presentation (2014)
Xavier Constant
 
PPTX
Introduction to Hadoop Technology
Manish Borkar
 
PPTX
Apache Hadoop Big Data Technology
Jay Nagar
 
PPT
Map Reduce
Michel Bruley
 
Hadoop institutes-in-bangalore
Kelly Technologies
 
Hadoop: A distributed framework for Big Data
Dhanashri Yadav
 
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
Analysing of big data using map reduce
Paladion Networks
 
Big data presentation (2014)
Xavier Constant
 
Introduction to Hadoop Technology
Manish Borkar
 
Apache Hadoop Big Data Technology
Jay Nagar
 
Map Reduce
Michel Bruley
 

What's hot (20)

PPTX
Fundamental of Big Data with Hadoop and Hive
Sharjeel Imtiaz
 
PDF
Applying stratosphere for big data analytics
Avinash Pandu
 
PPTX
Stratosphere with big_data_analytics
Avinash Pandu
 
PPTX
MapReduce Paradigm
Dilip Reddy
 
PDF
Modeling with Hadoop kdd2011
Milind Bhandarkar
 
PPTX
Map Reduce
Rahul Agarwal
 
PDF
Large Scale Math with Hadoop MapReduce
Hortonworks
 
PDF
Scaling Storage and Computation with Hadoop
yaevents
 
PPTX
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
PPTX
Big Data and Cloud Computing
Farzad Nozarian
 
PPT
Map Reduce introduction
Muralidharan Deenathayalan
 
PPTX
MapReduce basic
Chirag Ahuja
 
PPTX
Map reduce paradigm explained
Dmytro Sandu
 
PPTX
Introduction to Map Reduce
Apache Apex
 
PPTX
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
PPTX
Hadoop
Kartik Kalpande Patil
 
PDF
Introduction to MapReduce & hadoop
Colin Su
 
PPTX
Hadoop and big data
Sharad Pandey
 
PPT
Hw09 Hadoop Development At Facebook Hive And Hdfs
Cloudera, Inc.
 
PPTX
Big data and hadoop
Roushan Sinha
 
Fundamental of Big Data with Hadoop and Hive
Sharjeel Imtiaz
 
Applying stratosphere for big data analytics
Avinash Pandu
 
Stratosphere with big_data_analytics
Avinash Pandu
 
MapReduce Paradigm
Dilip Reddy
 
Modeling with Hadoop kdd2011
Milind Bhandarkar
 
Map Reduce
Rahul Agarwal
 
Large Scale Math with Hadoop MapReduce
Hortonworks
 
Scaling Storage and Computation with Hadoop
yaevents
 
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
Big Data and Cloud Computing
Farzad Nozarian
 
Map Reduce introduction
Muralidharan Deenathayalan
 
MapReduce basic
Chirag Ahuja
 
Map reduce paradigm explained
Dmytro Sandu
 
Introduction to Map Reduce
Apache Apex
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
Introduction to MapReduce & hadoop
Colin Su
 
Hadoop and big data
Sharad Pandey
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Cloudera, Inc.
 
Big data and hadoop
Roushan Sinha
 
Ad

Viewers also liked (17)

PDF
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
PPTX
Day 1 1505 - 1550 - pearl 1 - vimal kumar khanna
PMI2011
 
PDF
Sync is hard: building offline-first Android apps from the ground up
droidcon Dubai
 
PDF
Architecture Patterns - Open Discussion
Nguyen Tung
 
PDF
Logging Application Behavior to MongoDB
Robert Stewart
 
PPTX
Facebook architecture presentation: scalability challenge
Cristina Munoz
 
PDF
Facebook Architecture - Breaking it Open
HARMAN Services
 
PDF
2017 Silicon Valley Investment Trends by Edith Yeung
Edith Yeung
 
PDF
facebook architecture for 600M users
Jongyoon Choi
 
PDF
The Australian Startup Stack
Stripe
 
PDF
Scalable JavaScript Application Architecture
Nicholas Zakas
 
PPTX
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
 
PDF
Mapa procesos pmbok 5
Pedro Arcas
 
KEY
Event Driven Architecture
Stefan Norberg
 
PDF
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Alan McSweeney
 
PDF
Structured Approach to Solution Architecture
Alan McSweeney
 
PDF
TEDx Manchester: AI & The Future of Work
Volker Hirsch
 
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Day 1 1505 - 1550 - pearl 1 - vimal kumar khanna
PMI2011
 
Sync is hard: building offline-first Android apps from the ground up
droidcon Dubai
 
Architecture Patterns - Open Discussion
Nguyen Tung
 
Logging Application Behavior to MongoDB
Robert Stewart
 
Facebook architecture presentation: scalability challenge
Cristina Munoz
 
Facebook Architecture - Breaking it Open
HARMAN Services
 
2017 Silicon Valley Investment Trends by Edith Yeung
Edith Yeung
 
facebook architecture for 600M users
Jongyoon Choi
 
The Australian Startup Stack
Stripe
 
Scalable JavaScript Application Architecture
Nicholas Zakas
 
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
 
Mapa procesos pmbok 5
Pedro Arcas
 
Event Driven Architecture
Stefan Norberg
 
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Alan McSweeney
 
Structured Approach to Solution Architecture
Alan McSweeney
 
TEDx Manchester: AI & The Future of Work
Volker Hirsch
 
Ad

Similar to Hadoop (20)

PPTX
Hadoop
Anil Reddy
 
PPTX
HADOOP
Harinder Kaur
 
PPTX
Hadoop-part1 in cloud computing subject.pptx
JyotiLohar6
 
PPTX
Big Data and Hadoop with MapReduce Paradigms
Arundhati Kanungo
 
PPT
Hadoop
chandinisanz
 
PPTX
Hadoop bigdata overview
harithakannan
 
PPT
Hadoop and Mapreduce Introduction
rajsandhu1989
 
PPTX
Hadoop introduction
Chirag Ahuja
 
PPTX
2. hadoop fundamentals
Lokesh Ramaswamy
 
PPT
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
PPTX
Hadoop info
Nikita Sure
 
PPTX
Distributed computing poli
ivascucristian
 
PDF
Report Hadoop Map Reduce
Urvashi Kataria
 
PPT
Hadoop
Girish Khanzode
 
PPTX
Cppt
chunkypandey12
 
PPTX
Cppt
chunkypandey12
 
PPTX
Cppt Hadoop
chunkypandey12
 
PPT
hadoop
swatic018
 
PPT
hadoop
swatic018
 
PPTX
Hadoop and Big data in Big data and cloud.pptx
gvlbcy
 
Hadoop
Anil Reddy
 
HADOOP
Harinder Kaur
 
Hadoop-part1 in cloud computing subject.pptx
JyotiLohar6
 
Big Data and Hadoop with MapReduce Paradigms
Arundhati Kanungo
 
Hadoop
chandinisanz
 
Hadoop bigdata overview
harithakannan
 
Hadoop and Mapreduce Introduction
rajsandhu1989
 
Hadoop introduction
Chirag Ahuja
 
2. hadoop fundamentals
Lokesh Ramaswamy
 
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
Hadoop info
Nikita Sure
 
Distributed computing poli
ivascucristian
 
Report Hadoop Map Reduce
Urvashi Kataria
 
Hadoop
Girish Khanzode
 
Cppt Hadoop
chunkypandey12
 
hadoop
swatic018
 
hadoop
swatic018
 
Hadoop and Big data in Big data and cloud.pptx
gvlbcy
 

Recently uploaded (20)

PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PPTX
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PPT
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
Zero Carbon Building Performance standard
BassemOsman1
 
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 

Hadoop

  • 1. Hadoop, a distributed framework for Big Data Presented By: Bhushan Kulkarni T.E(I.T)
  • 2. Contents 1. Introduction and Hadoop’s history 2. Architecture in detail 3. Hadoop in industry
  • 3. What is Hadoop? â€ĸ Apache top level project, open-source implementation of frameworks for reliable, scalable, distributed computing and data storage. â€ĸ It is a flexible and highly-available architecture for large scale computation and data processing on a network of commodity hardware.
  • 4. What is Hadoop â€ĸ Hadoop is a software framework for distributed processing of large datasets across large clusters of computers â€ĸ Large datasets  Terabytes or petabytes of data â€ĸ Large clusters  hundreds or thousands of nodes â€ĸ Hadoop is open-source implementation for Google MapReduce â€ĸ Hadoop is based on a simple programming model called MapReduce â€ĸ Hadoop is based on a simple data model, any data will fit 4
  • 5. Brief History of Hadoop â€ĸ Google introduced Map reduce Algorithm. â€ĸ Doug Cutting and team took the solution provided by Google and started an Open Source Project called HADOOP in 2005 and Doug named it after his son's toy elephant.
  • 6. Hadoop’s Developers Doug Cutting 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo. 2006: Yahoo gave the project to Apache Software Foundation.
  • 7. Large-Scale Data Analytics â€ĸ MapReduce computing paradigm (E.g., Hadoop) vs. Traditional database systems 7 Database vs. ī‚— Many enterprises are turning to Hadoop ī‚Ą Especially applications generating big data ī‚Ą Web applications, social networks, scientific applications
  • 8. Why Hadoop is able to compete? 8 Scalability (petabytes of data, thousands of machines) Database vs. Flexibility in accepting all data formats (no schema) Commodity inexpensive hardware Efficient and simple fault- tolerant mechanism Performance (tons of indexing, tuning, data organization tech.) Structured Data
  • 9. Key Components â€ĸ Hadoop framework consists on two main layers â€ĸ Distributed file system (HDFS) â€ĸ Execution engine (MapReduce) 9
  • 10. Hadoop: How it Works 10
  • 11. Hadoop Architecture 11 Master node (single node) Many slave nodes â€ĸ Distributed file system (HDFS) â€ĸ Execution engine (MapReduce)
  • 12. Hadoop Distributed File System (HDFS) 12 Centralized namenode - Maintains metadata info about files Many datanode (1000s) - Store the actual data - Files are divided into blocks - Each block is replicated N times (Default = 3) File F Blocks (64 MB)
  • 13. Main Properties of HDFS â€ĸ Large: A HDFS instance may consist of thousands of server machines, each storing part of the file system’s data â€ĸ Replication: Each data block is replicated many times (default is 3) â€ĸ Failure: Failure occurs rarely â€ĸ Fault Tolerance: Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS â€ĸ Namenode is consistently checking Datanodes 13
  • 14. What is MapReduce? â€ĸ MapReduce is a programming model â€ĸ Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines â€ĸ MapReduce is an associated implementation for processing and generating large data sets. MapReduce MAP map function that processes a key/value pair to generate a set of intermediate key/value pairs REDUCE and a reduce function that merges all intermediate values associated with the same intermediate key.
  • 15. Properties of MapReduce Engine â€ĸ Job Tracker is the master node (runs with the namenode) â€ĸ Receives the user’s job â€ĸ Decides on how many tasks will run (number of mappers) â€ĸ Decides on where to run each mapper (concept of locality) 15 â€ĸ This file has 5 Blocks  run 5 map tasks â€ĸ Where to run the task reading block “1” â€ĸ Try to run it on Node 1 or Node 3 Node 1 Node 2 Node 3
  • 16. Properties of MapReduce Engine (Cont’d) â€ĸ Task Tracker is the slave node (runs on each datanode) â€ĸ Receives the task from Job Tracker â€ĸ Runs the task until completion (either map or reduce task) â€ĸ Always in communication with the Job Tracker reporting progress 16
  • 17. The Programming Model Of MapReduce Map takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key. The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values
  • 18. MapReduce data flow with a single reduce task
  • 19. MapReduce data flow with multiple reduce tasks
  • 20. MapReduce data flow with no reduce tasks
  • 21. Example 1 : Color Count 21 Shuffle & Sorting based on k Reduce Reduce Reduce Map Map Map Map Input blocks on HDFS Produces (k, v) ( , 1) Parse-hash Parse-hash Parse-hash Parse-hash Consumes(k, [v]) ( , [1,1,1,1,1,1..]) Produces(k’, v’) ( , 100) Job: Count the number of each color in a data set Part0003 Part0002 Part0001 That’s the output file, it has 3 parts on probably 3 different machines
  • 22. Example 2: Color Filter 22 Job: Select only the blue and the green colors Input blocks on HDFS Map Map Map Map Produces (k, v) ( , 1) Write to HDFS Write to HDFS Write to HDFS Write to HDFS â€ĸ Each map task will select only the blue or green colors â€ĸ No need for reduce phase Part0001 Part0002 Part0003 Part0004 That’s the output file, it has 4 parts on probably 4 different machines
  • 23. Why use Hadoop? â€ĸ Need to process Multi Petabyte Datasets â€ĸ Data may not have strict schema â€ĸ Expensive to build reliability in each application â€ĸ Need common infrastructure â€ĸ Very Large Distributed File System â€ĸ Assumes Commodity Hardware â€ĸ Optimized for Batch Processing
  • 24. Who Uses MapReduce/Hadoop â€ĸ Google: Inventors of MapReduce computing paradigm â€ĸ Yahoo: Developing Hadoop open-source of MapReduce â€ĸ IBM, Microsoft, Oracle â€ĸ Facebook, Amazon, AOL, NetFlex â€ĸ Many others + universities and research labs 24