SlideShare a Scribd company logo
9
Most read
1
Cloud Computing :
MapReduce - Tutorial
Prof. Soumya K Ghosh
Department of Computer Science and Engineering
IIT KHARAGPUR
Introduction
• MapReduce: programming model developed at Google
• Objective:
– Implement large scale search
– Text processing on massively scalable web data stored using BigTable and GFS distributed file
system
• Designed for processing and generating large volumes of data via massively parallel
computations, utilizing tens of thousands of processors at a time
• Fault tolerant: ensure progress of computation even if processors and networks fail
• Example:
– Hadoop: open source implementation of MapReduce (developed at Yahoo!)
– Available on pre-packaged AMIs on Amazon EC2 cloud platform
9/11/2017 2
MapReduce Model
9/11/2017 3
• Parallel programming abstraction
• Used by many different parallel applications which carry out large-scale
computation involving thousands of processors
• Leverages a common underlying fault-tolerant implementation
• Two phases of MapReduce:
– Map operation
– Reduce operation
• A configurable number of M ‘mapper’ processors and R ‘reducer’ processors are
assigned to work on the problem
• The computation is coordinated by a single master process
MapReduce Model Contd…
9/11/2017 4
• Map phase:
– Each mapper reads approximately 1/M of the input from the global file
system, using locations given by the master
– Map operation consists of transforming one set of key-value pairs to
another:
– Each mapper writes computation results in one file per reducer
– Files are sorted by a key and stored to the local file system
– The master keeps track of the location of these files
MapReduce Model Contd…
9/11/2017 5
• Reduce phase:
– The master informs the reducers where the partial computations have been stored
on local files of respective mappers
– Reducers make remote procedure call requests to the mappers to fetch the files
– Each reducer groups the results of the map step using the same key and performs a
function f on the list of values that correspond to these key value:
– Final results are written back to the GFS file system
MapReduce: Example
9/11/2017 6
• 3 mappers; 2 reducers
• Map function:
• Reduce function:
Problem-1
9/11/2017 7
In a MapReduce framework consider the HDFS block size is 64 MB.
We have 3 files of size 64K, 65Mb and 127Mb. How many blocks will
be created by Hadoop framework?
Problem-2
9/11/2017 8
Write the pseudo-codes (for map and reduce functions) for calculating
the average of a set of integers in MapReduce.
Suppose A = (10, 20, 30, 40, 50) is a set of integers. Show the map and
reduce outputs.
Problem-3
9/11/2017 9
Compute total and average salary of organization XYZ and group by
based on gender (male or female) using MapReduce. The input is as
follows
Name, Gender, Salary
John, M, 10,000
Martha, F, 15,000
----
Problem-4
9/11/2017 10
Write the Map and Reduce functions (pseudo-codes) for the following Word
Length Categorization problem under MapReduce model.
Word Length Categorization: Given a text paragraph (containing only words),
categorize each word into following categories. Output the frequency of
occurrence of words in each category.
Categories:
tiny: 1-2 letters; small: 3-5 letters; medium: 6-9 letters; big: 10 or more letters
11

More Related Content

What's hot (20)

PPTX
College management presentation using Oracle 10G
AIUB
 
PPTX
Cloud computing and Grid Computing
prabathsl
 
PDF
Lect6-An introduction to ontologies and ontology development
Antonio Moreno
 
PPTX
Ipv4 header
Pouyan Zamani
 
PPTX
Subnetting (FLSM & VLSM) with examples
Krishna Mohan
 
PPT
System models in distributed system
ishapadhy
 
ODP
Cloud, IoT, Big Data, and Virtualization
Aditya Widya Manggala
 
PDF
OOAD - UML - Sequence and Communication Diagrams - Lab
Victer Paul
 
PDF
Functional Dependency
Alaanoor94
 
PPTX
Design Goals of Distributed System
Ashish KC
 
PPTX
HDLC and Point to point protocol
Kinza Razzaq
 
PPTX
Computer Networks - Layers in OSI Model
Shreyash Agarwal
 
PPTX
Deadlock dbms
Vardhil Patel
 
PPT
02 xml schema
Baskarkncet
 
PPTX
Google file system GFS
zihad164
 
PPT
Intro (Distributed computing)
Sri Prasanna
 
PPTX
Memory Management
SanthiNivas
 
PDF
Reasoning in AI.pdf
HarjeetSingh651810
 
PPTX
Bayesian Belief Network and its Applications.pptx
SamyakJain710491
 
PPT
Comparison and Contrast between OSI and TCP/IP Model
Conferencias FIST
 
College management presentation using Oracle 10G
AIUB
 
Cloud computing and Grid Computing
prabathsl
 
Lect6-An introduction to ontologies and ontology development
Antonio Moreno
 
Ipv4 header
Pouyan Zamani
 
Subnetting (FLSM & VLSM) with examples
Krishna Mohan
 
System models in distributed system
ishapadhy
 
Cloud, IoT, Big Data, and Virtualization
Aditya Widya Manggala
 
OOAD - UML - Sequence and Communication Diagrams - Lab
Victer Paul
 
Functional Dependency
Alaanoor94
 
Design Goals of Distributed System
Ashish KC
 
HDLC and Point to point protocol
Kinza Razzaq
 
Computer Networks - Layers in OSI Model
Shreyash Agarwal
 
Deadlock dbms
Vardhil Patel
 
02 xml schema
Baskarkncet
 
Google file system GFS
zihad164
 
Intro (Distributed computing)
Sri Prasanna
 
Memory Management
SanthiNivas
 
Reasoning in AI.pdf
HarjeetSingh651810
 
Bayesian Belief Network and its Applications.pptx
SamyakJain710491
 
Comparison and Contrast between OSI and TCP/IP Model
Conferencias FIST
 

Similar to Mod05lec23(map reduce tutorial) (20)

PDF
Report Hadoop Map Reduce
Urvashi Kataria
 
PDF
E031201032036
ijceronline
 
PPT
Lecture Slide - Introduction to Hadoop, HDFS, MapR.ppt
SuchithraaPalani
 
PDF
Hadoop
devakalyan143
 
PPTX
Big Data.pptx
NelakurthyVasanthRed1
 
PDF
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
PPT
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
PPTX
Hadoop
Anil Reddy
 
PPTX
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
PPTX
Big Data Processing
Michael Ming Lei
 
PDF
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
PPTX
Mapreduce is for Hadoop Ecosystem in Data Science
DakshGoti2
 
PPTX
Hadoop
Bhushan Kulkarni
 
PPT
Map reducecloudtech
Jakir Hossain
 
PPTX
Hadoop introduction
Dong Ngoc
 
PDF
Hadoop scheduler with deadline constraint
ijccsa
 
PPT
Hadoop and Mapreduce Introduction
rajsandhu1989
 
PPT
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
PDF
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 
PPTX
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
bhuvankumar3877
 
Report Hadoop Map Reduce
Urvashi Kataria
 
E031201032036
ijceronline
 
Lecture Slide - Introduction to Hadoop, HDFS, MapR.ppt
SuchithraaPalani
 
Big Data.pptx
NelakurthyVasanthRed1
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
Hadoop
Anil Reddy
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Deanna Kosaraju
 
Big Data Processing
Michael Ming Lei
 
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Mapreduce is for Hadoop Ecosystem in Data Science
DakshGoti2
 
Map reducecloudtech
Jakir Hossain
 
Hadoop introduction
Dong Ngoc
 
Hadoop scheduler with deadline constraint
ijccsa
 
Hadoop and Mapreduce Introduction
rajsandhu1989
 
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
Reynold Xin
 
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
bhuvankumar3877
 
Ad

More from Ankit Gupta (20)

PPT
Biometricstechnology in iot and machine learning
Ankit Gupta
 
PDF
Week2 cloud computing week2
Ankit Gupta
 
PDF
Week 8 lecture material
Ankit Gupta
 
PDF
Week 4 lecture material cc (1)
Ankit Gupta
 
PDF
Week 1 lecture material cc
Ankit Gupta
 
PDF
Mod05lec25(resource mgmt ii)
Ankit Gupta
 
PDF
Mod05lec24(resource mgmt i)
Ankit Gupta
 
PDF
Mod05lec22(cloudonomics tutorial)
Ankit Gupta
 
PDF
Lecture29 cc-security4
Ankit Gupta
 
PDF
Lecture28 cc-security3
Ankit Gupta
 
PDF
Lecture27 cc-security2
Ankit Gupta
 
PDF
Lecture26 cc-security1
Ankit Gupta
 
PDF
Lecture 30 cloud mktplace
Ankit Gupta
 
PDF
Week 7 lecture material
Ankit Gupta
 
PDF
Gurukul Cse cbcs-2015-16
Ankit Gupta
 
PDF
Microprocessor full hand made notes
Ankit Gupta
 
PPTX
Transfer Leaning Using Pytorch synopsis Minor project pptx
Ankit Gupta
 
DOC
Intro/Overview on Machine Learning Presentation -2
Ankit Gupta
 
PPTX
Intro/Overview on Machine Learning Presentation
Ankit Gupta
 
PDF
Cloud computing ebook
Ankit Gupta
 
Biometricstechnology in iot and machine learning
Ankit Gupta
 
Week2 cloud computing week2
Ankit Gupta
 
Week 8 lecture material
Ankit Gupta
 
Week 4 lecture material cc (1)
Ankit Gupta
 
Week 1 lecture material cc
Ankit Gupta
 
Mod05lec25(resource mgmt ii)
Ankit Gupta
 
Mod05lec24(resource mgmt i)
Ankit Gupta
 
Mod05lec22(cloudonomics tutorial)
Ankit Gupta
 
Lecture29 cc-security4
Ankit Gupta
 
Lecture28 cc-security3
Ankit Gupta
 
Lecture27 cc-security2
Ankit Gupta
 
Lecture26 cc-security1
Ankit Gupta
 
Lecture 30 cloud mktplace
Ankit Gupta
 
Week 7 lecture material
Ankit Gupta
 
Gurukul Cse cbcs-2015-16
Ankit Gupta
 
Microprocessor full hand made notes
Ankit Gupta
 
Transfer Leaning Using Pytorch synopsis Minor project pptx
Ankit Gupta
 
Intro/Overview on Machine Learning Presentation -2
Ankit Gupta
 
Intro/Overview on Machine Learning Presentation
Ankit Gupta
 
Cloud computing ebook
Ankit Gupta
 
Ad

Recently uploaded (20)

PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PPTX
MATLAB : Introduction , Features , Display Windows, Syntax, Operators, Graph...
Amity University, Patna
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PPT
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PDF
smart lot access control system with eye
rasabzahra
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...
SoundaryaBC2
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPT
Electrical Safety Presentation for Basics Learning
AliJaved79382
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
MATLAB : Introduction , Features , Display Windows, Syntax, Operators, Graph...
Amity University, Patna
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
smart lot access control system with eye
rasabzahra
 
Design Thinking basics for Engineers.pdf
CMR University
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...
SoundaryaBC2
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
Electrical Safety Presentation for Basics Learning
AliJaved79382
 

Mod05lec23(map reduce tutorial)

  • 1. 1 Cloud Computing : MapReduce - Tutorial Prof. Soumya K Ghosh Department of Computer Science and Engineering IIT KHARAGPUR
  • 2. Introduction • MapReduce: programming model developed at Google • Objective: – Implement large scale search – Text processing on massively scalable web data stored using BigTable and GFS distributed file system • Designed for processing and generating large volumes of data via massively parallel computations, utilizing tens of thousands of processors at a time • Fault tolerant: ensure progress of computation even if processors and networks fail • Example: – Hadoop: open source implementation of MapReduce (developed at Yahoo!) – Available on pre-packaged AMIs on Amazon EC2 cloud platform 9/11/2017 2
  • 3. MapReduce Model 9/11/2017 3 • Parallel programming abstraction • Used by many different parallel applications which carry out large-scale computation involving thousands of processors • Leverages a common underlying fault-tolerant implementation • Two phases of MapReduce: – Map operation – Reduce operation • A configurable number of M ‘mapper’ processors and R ‘reducer’ processors are assigned to work on the problem • The computation is coordinated by a single master process
  • 4. MapReduce Model Contd… 9/11/2017 4 • Map phase: – Each mapper reads approximately 1/M of the input from the global file system, using locations given by the master – Map operation consists of transforming one set of key-value pairs to another: – Each mapper writes computation results in one file per reducer – Files are sorted by a key and stored to the local file system – The master keeps track of the location of these files
  • 5. MapReduce Model Contd… 9/11/2017 5 • Reduce phase: – The master informs the reducers where the partial computations have been stored on local files of respective mappers – Reducers make remote procedure call requests to the mappers to fetch the files – Each reducer groups the results of the map step using the same key and performs a function f on the list of values that correspond to these key value: – Final results are written back to the GFS file system
  • 6. MapReduce: Example 9/11/2017 6 • 3 mappers; 2 reducers • Map function: • Reduce function:
  • 7. Problem-1 9/11/2017 7 In a MapReduce framework consider the HDFS block size is 64 MB. We have 3 files of size 64K, 65Mb and 127Mb. How many blocks will be created by Hadoop framework?
  • 8. Problem-2 9/11/2017 8 Write the pseudo-codes (for map and reduce functions) for calculating the average of a set of integers in MapReduce. Suppose A = (10, 20, 30, 40, 50) is a set of integers. Show the map and reduce outputs.
  • 9. Problem-3 9/11/2017 9 Compute total and average salary of organization XYZ and group by based on gender (male or female) using MapReduce. The input is as follows Name, Gender, Salary John, M, 10,000 Martha, F, 15,000 ----
  • 10. Problem-4 9/11/2017 10 Write the Map and Reduce functions (pseudo-codes) for the following Word Length Categorization problem under MapReduce model. Word Length Categorization: Given a text paragraph (containing only words), categorize each word into following categories. Output the frequency of occurrence of words in each category. Categories: tiny: 1-2 letters; small: 3-5 letters; medium: 6-9 letters; big: 10 or more letters
  • 11. 11