SlideShare a Scribd company logo
MAP REDUCE
PROGRAMMING
Dr G Sudha Sadasivam
Map - reduce
• sort/merge based distributed processing
• Best for batch- oriented processing
• Sort/merge is primitive
– Operates at transfer rate (Process+data clusters)
• Simple programming metaphor:
– input | map | shuffle | reduce > output
– cat * | grep | sort | uniq ­
c > file
• Pluggable user code runs in generic reusable framework
– log processing,
-- web search indexing
– SQL like queries in PIG
• Distribution & reliability
– Handled by framework - transparency
MR model
• Map()
– Process a key/value pair to generate intermediate
key/value pairs
• Reduce()
– Merge all intermediate values associated with the same
key
• Users implement interface of two primary methods:
1. Map: (key1, val1) → (key2, val2)
2. Reduce: (key2, [val2]) → [val3]
• Map - clause group-by (for Key) of an aggregate function
of SQL
• Reduce - aggregate function (e.g., average) that is
computed over all the rows with the same group-by
attribute (key).
• Application writer specifies
– A pair of functions called Map and Reduce and a set of input
files and submits the job
• Workflow
– Input phase generates a number of FileSplits from input files
(one per Map task)
– The Map phase executes a user function to transform input
kev-pairs into a new set of kev-pairs
– The framework sorts & Shuffles the kev-pairs to output nodes
– The Reduce phase combines all kev-pairs with the same key
into new kevpairs
– The output phase writes the resulting pairs to files
• All phases are distributed with many tasks doing the work
– Framework handles scheduling of tasks on cluster
– Framework handles recovery when a node fails
Data distribution
• Input files are split into M pieces on distributed
file system - 128 MB blocks
• Intermediate files created from map tasks are
written to local disk
• Output files are written to distributed file system
Assigning tasks
• Many copies of user program are started
• Tries to utilize data localization by running map
tasks on machines with data
• One instance becomes the Master
• Master finds idle machines and assigns them
tasks
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
Execution
• Map workers read in contents of corresponding input
partition
• Perform user-defined map computation to create
intermediate <key,value> pairs
• Periodically buffered output pairs written to local disk
Reduce
• Reduce workers iterate over ordered intermediate data
– Each unique key encountered – values are passed to
user's reduce function
– eg. <key, [value1, value2,..., valueN]>
• Output of user's reduce function is written to output file
on global file system
• When all tasks have completed, master wakes up user
program
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
• Map
• Reduce
• Combiner – combines the O/P of a single map
task. Same as reducer, but it stores the
intermediate O/P in a local file wrt final output file
• Debugging
We can test the tasks locally using special Map
reduce libraries
Offers human readable status info on http server
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
WORD COUNT EXAMPLE
map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, "1");
reduce(String output_key, Iterator
intermediate_values):
// output_key: a word
// output_values: a list of counts
int result = 0;
for each v in intermediate_values:
result += ParseInt(v);
Emit(AsString(result));
• Map()
– Input <filename, file text>
– Parses file and emits <word, count> pairs
• eg. <”hello”, 1>
• Reduce()
– Sums all values for the same key and emits
<word, TotalCount>
• eg. <”hello”, (3 5 2 7)> => <”hello”, 17>
• File
Hello World Bye World
Hello Hadoop GoodBye Hadoop
• Map
For the given sample input the first map emits:
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
• The second map emits:
< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>
The output of the first combine:
< Bye, 1>
< Hello, 1>
< World, 2>
The output of the second combine:
< Goodbye, 1>
< Hadoop, 2>
< Hello, 1>
Thus the output of the job (reduce) is:
< Bye, 1>
< Goodbye, 1>
< Hadoop, 2>
< Hello, 2>
< World, 2>
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
MAP REDUCE PROGRAMMING_using hadoop_a.ppt
Configuration
CONCLUSION
Hadoop Map-Reduce is a software framework for
easily writing applications which process vast
amounts of data (multi-terabyte data-sets) in-
parallel on large clusters (thousands of nodes) of
commodity hardware in a reliable, fault-tolerant
manner.

More Related Content

Similar to MAP REDUCE PROGRAMMING_using hadoop_a.ppt (20)

PPTX
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
harikumar288574
 
PDF
Unit-2 Hadoop Framework.pdf
Sitamarhi Institute of Technology
 
PDF
Large Scale Data Processing & Storage
Ilayaraja P
 
PPTX
introduction to Complete Map and Reduce Framework
harikumar288574
 
PDF
MapReduce basics
Harisankar H
 
PDF
MapReduce
ahmedelmorsy89
 
PPTX
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
PPTX
Map reduce and Hadoop on windows
Muhammad Shahid
 
PDF
Introduction to map reduce
Bhupesh Chawda
 
PDF
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
PPT
Map Reduce
Manuel Correa
 
PDF
An Introduction to MapReduce
Sina Ebrahimi
 
PDF
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
IndicThreads
 
PPTX
Hadoop and Mapreduce for .NET User Group
Csaba Toth
 
PDF
Introduction to the Map-Reduce framework.pdf
BikalAdhikari4
 
PPTX
MapReduce.pptx
AtulYadav218546
 
PPT
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
PDF
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
PPTX
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
harikumar288574
 
Unit-2 Hadoop Framework.pdf
Sitamarhi Institute of Technology
 
Large Scale Data Processing & Storage
Ilayaraja P
 
introduction to Complete Map and Reduce Framework
harikumar288574
 
MapReduce basics
Harisankar H
 
MapReduce
ahmedelmorsy89
 
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
Map reduce and Hadoop on windows
Muhammad Shahid
 
Introduction to map reduce
Bhupesh Chawda
 
Lecture 3: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
Map Reduce
Manuel Correa
 
An Introduction to MapReduce
Sina Ebrahimi
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
IndicThreads
 
Hadoop and Mapreduce for .NET User Group
Csaba Toth
 
Introduction to the Map-Reduce framework.pdf
BikalAdhikari4
 
MapReduce.pptx
AtulYadav218546
 
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 

More from Harish Khodke (13)

PPTX
Big Data_Big Data_Big Data-Big Data_Big Data
Harish Khodke
 
PDF
Bootstrap for webtechnology_data science.pdf
Harish Khodke
 
PDF
Plsql lab mannual
Harish Khodke
 
PDF
17515
Harish Khodke
 
ODT
Exp 8...
Harish Khodke
 
ODT
Exp 8...
Harish Khodke
 
PDF
15ss
Harish Khodke
 
TXT
rtrtrNew text document
Harish Khodke
 
TXT
Result analysis hek (1)
Harish Khodke
 
PDF
07 top-down-parsing
Harish Khodke
 
PDF
5 k z mao
Harish Khodke
 
DOCX
Jdbc
Harish Khodke
 
PDF
It 4-yr-1-sem-digital image processing
Harish Khodke
 
Big Data_Big Data_Big Data-Big Data_Big Data
Harish Khodke
 
Bootstrap for webtechnology_data science.pdf
Harish Khodke
 
Plsql lab mannual
Harish Khodke
 
Exp 8...
Harish Khodke
 
Exp 8...
Harish Khodke
 
rtrtrNew text document
Harish Khodke
 
Result analysis hek (1)
Harish Khodke
 
07 top-down-parsing
Harish Khodke
 
5 k z mao
Harish Khodke
 
It 4-yr-1-sem-digital image processing
Harish Khodke
 
Ad

Recently uploaded (20)

PPTX
Day2 B2 Best.pptx
helenjenefa1
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PPTX
Snet+Pro+Service+Software_SNET+Pro+2+Instructions.pptx
jenilsatikuvar1
 
PDF
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
PDF
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PDF
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
PPTX
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Day2 B2 Best.pptx
helenjenefa1
 
MRRS Strength and Durability of Concrete
CivilMythili
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
Snet+Pro+Service+Software_SNET+Pro+2+Instructions.pptx
jenilsatikuvar1
 
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Design Thinking basics for Engineers.pdf
CMR University
 
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Ad

MAP REDUCE PROGRAMMING_using hadoop_a.ppt

  • 1. MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam
  • 2. Map - reduce • sort/merge based distributed processing • Best for batch- oriented processing • Sort/merge is primitive – Operates at transfer rate (Process+data clusters) • Simple programming metaphor: – input | map | shuffle | reduce > output – cat * | grep | sort | uniq ­ c > file • Pluggable user code runs in generic reusable framework – log processing, -- web search indexing – SQL like queries in PIG • Distribution & reliability – Handled by framework - transparency
  • 3. MR model • Map() – Process a key/value pair to generate intermediate key/value pairs • Reduce() – Merge all intermediate values associated with the same key • Users implement interface of two primary methods: 1. Map: (key1, val1) → (key2, val2) 2. Reduce: (key2, [val2]) → [val3] • Map - clause group-by (for Key) of an aggregate function of SQL • Reduce - aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute (key).
  • 4. • Application writer specifies – A pair of functions called Map and Reduce and a set of input files and submits the job • Workflow – Input phase generates a number of FileSplits from input files (one per Map task) – The Map phase executes a user function to transform input kev-pairs into a new set of kev-pairs – The framework sorts & Shuffles the kev-pairs to output nodes – The Reduce phase combines all kev-pairs with the same key into new kevpairs – The output phase writes the resulting pairs to files • All phases are distributed with many tasks doing the work – Framework handles scheduling of tasks on cluster – Framework handles recovery when a node fails
  • 5. Data distribution • Input files are split into M pieces on distributed file system - 128 MB blocks • Intermediate files created from map tasks are written to local disk • Output files are written to distributed file system Assigning tasks • Many copies of user program are started • Tries to utilize data localization by running map tasks on machines with data • One instance becomes the Master • Master finds idle machines and assigns them tasks
  • 7. Execution • Map workers read in contents of corresponding input partition • Perform user-defined map computation to create intermediate <key,value> pairs • Periodically buffered output pairs written to local disk Reduce • Reduce workers iterate over ordered intermediate data – Each unique key encountered – values are passed to user's reduce function – eg. <key, [value1, value2,..., valueN]> • Output of user's reduce function is written to output file on global file system • When all tasks have completed, master wakes up user program
  • 11. • Map • Reduce • Combiner – combines the O/P of a single map task. Same as reducer, but it stores the intermediate O/P in a local file wrt final output file • Debugging We can test the tasks locally using special Map reduce libraries Offers human readable status info on http server
  • 14. map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));
  • 15. • Map() – Input <filename, file text> – Parses file and emits <word, count> pairs • eg. <”hello”, 1> • Reduce() – Sums all values for the same key and emits <word, TotalCount> • eg. <”hello”, (3 5 2 7)> => <”hello”, 17>
  • 16. • File Hello World Bye World Hello Hadoop GoodBye Hadoop • Map For the given sample input the first map emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> • The second map emits: < Hello, 1> < Hadoop, 1> < Goodbye, 1> < Hadoop, 1>
  • 17. The output of the first combine: < Bye, 1> < Hello, 1> < World, 2> The output of the second combine: < Goodbye, 1> < Hadoop, 2> < Hello, 1> Thus the output of the job (reduce) is: < Bye, 1> < Goodbye, 1> < Hadoop, 2> < Hello, 2> < World, 2>
  • 21. CONCLUSION Hadoop Map-Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in- parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.