SlideShare a Scribd company logo
2
Most read
3
Most read
7
Most read
Developing a MapReduce
Application
Dr. C.V. Suresh Babu
(CentreforKnowledgeTransfer)
institute
Phases of Developing a MapReduce
Application
• Configuration API
• Configuring the Development Environment
• GenericOptionsParser, Tool and ToolRunner
• Writing Unit Tests
• Running locally and in a cluster on Test Data
• The MapReduce Web UI
• Hadoop Logs
• Tuning a Job to improve performance
(CentreforKnowledgeTransfer)
institute
Stages 1:Developing a MapReduce
Application
• Writing a program in MapReduce follows a certain pattern.
• You start by writing your map and reduce functions, ideally with unit
tests to make sure they do what you expect.
• Then you write a driver program to run a job, which can run from your
IDE using a small subset of the data to check that it is working.
• If it fails, you can use your IDE’s debugger to find the source of the
problem.
• With this information, you can expand your unit tests to cover this case
and improve your mapper or reducer as appropriate to handle such input
correctly.
(CentreforKnowledgeTransfer)
institute
Stages 2:Developing a MapReduce
Application
• When the program runs as expected against the small dataset, you
are ready to unleash it on a cluster.
• Running against the full dataset is likely to expose some more
issues, which you can fix as before, by expanding your tests and
mapper or reducer to handle the new cases.
• Debugging failing programs in the cluster is a challenge, so we
look at some common techniques to make it easier.
(CentreforKnowledgeTransfer)
institute
Stage 3: Developing a MapReduce
Application
• After the program is working, you may wish to do some tuning,
first by running through some standard checks for making
MapReduce programs faster and then by doing task profiling.
• Profiling distributed programs is not easy, but Hadoop has hooks to
aid the process.
(CentreforKnowledgeTransfer)
institute
Example: Word Count
Task: Counting the word occurances (frequencies) in a text file (or set of files).
<word, count >as <key, value >pair
Mapper: Emits <word, 1 >for each word (no counting at this part).
Shuffle in between: pairs with same keys grouped together and passed to a
single machine.
Reducer: Sums up the values (1s) with the same key value
(CentreforKnowledgeTransfer)
institute
Example:
Word Count
(CentreforKnowledgeTransfer)
institute
Example:
Job Tracker
(CentreforKnowledgeTransfer)
institute
Example:
Job Tracker
(CentreforKnowledgeTransfer)
institute
Example:
Job Tracker
(CentreforKnowledgeTransfer)
institute

More Related Content

What's hot (20)

PPTX
Importance & Principles of Modeling from UML Designing
ABHISHEK KUMAR
 
PDF
Map Reduce data types and formats
Vigen Sahakyan
 
PDF
Ooad
Hari Aryal
 
PPT
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
PPTX
Vision of cloud computing
gaurav jain
 
PPT
Comet Cloud
pradeepas7
 
PPTX
Clustering in Data Mining
Archana Swaminathan
 
PPTX
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)
IrtazaAfzal3
 
PPTX
Using prior knowledge to initialize the hypothesis,kbann
swapnac12
 
PDF
Address in the target code in Compiler Construction
Muhammad Haroon
 
PPT
Transport services
Navin Kumar
 
PPTX
Data Designs (Software Engg.)
Arun Shukla
 
PPTX
Challenges of Conventional Systems.pptx
GovardhanV7
 
PPT
Rad model
Sneha Chopra
 
PPTX
Eucalyptus, Nimbus & OpenNebula
Amar Myana
 
PPT
4.2 spatial data mining
Krish_ver2
 
PDF
Big Data Analytics with R
Great Wide Open
 
PPTX
Density based methods
SVijaylakshmi
 
PPT
5.1 mining data streams
Krish_ver2
 
Importance & Principles of Modeling from UML Designing
ABHISHEK KUMAR
 
Map Reduce data types and formats
Vigen Sahakyan
 
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Vision of cloud computing
gaurav jain
 
Comet Cloud
pradeepas7
 
Clustering in Data Mining
Archana Swaminathan
 
PRESCRIPTIVE PROCESS MODEL(SOFTWARE ENGINEERING)
IrtazaAfzal3
 
Using prior knowledge to initialize the hypothesis,kbann
swapnac12
 
Address in the target code in Compiler Construction
Muhammad Haroon
 
Transport services
Navin Kumar
 
Data Designs (Software Engg.)
Arun Shukla
 
Challenges of Conventional Systems.pptx
GovardhanV7
 
Rad model
Sneha Chopra
 
Eucalyptus, Nimbus & OpenNebula
Amar Myana
 
4.2 spatial data mining
Krish_ver2
 
Big Data Analytics with R
Great Wide Open
 
Density based methods
SVijaylakshmi
 
5.1 mining data streams
Krish_ver2
 

Similar to Developing a Map Reduce Application (20)

PDF
1. Big Data - Introduction(what is bigdata).pdf
AmanCSE050
 
PPTX
writing Hadoop Map Reduce programs
jani shaik
 
PPTX
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
bhuvankumar3877
 
PPTX
Mapreduce Hadop.pptx
Bangladesh University of Professionals
 
PPTX
Hadoop introduction
Dong Ngoc
 
PPT
Hadoop MapReduce
Urvashi Kataria
 
PDF
Big Data Analytics [email protected]
WasyihunSema2
 
PDF
An Introduction to MapReduce
Sina Ebrahimi
 
PDF
Functional programming
 for optimization problems 
in Big Data
Paco Nathan
 
PDF
Report Hadoop Map Reduce
Urvashi Kataria
 
PPTX
Cascading User Group Meet
Vinoth Kannan
 
PDF
Introduction to apache hadoop
Shashwat Shriparv
 
PDF
Mapredtutorial
Anup Mohta
 
PPTX
Unit 2
vishal choudhary
 
PDF
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
PPTX
Map Reduce
Prashant Gupta
 
PPT
11. From Hadoop to Spark 1:2
Fabio Fumarola
 
PPTX
Introduction to MapReduce
Hassan A-j
 
PDF
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
1. Big Data - Introduction(what is bigdata).pdf
AmanCSE050
 
writing Hadoop Map Reduce programs
jani shaik
 
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
bhuvankumar3877
 
Hadoop introduction
Dong Ngoc
 
Hadoop MapReduce
Urvashi Kataria
 
Big Data Analytics [email protected]
WasyihunSema2
 
An Introduction to MapReduce
Sina Ebrahimi
 
Functional programming
 for optimization problems 
in Big Data
Paco Nathan
 
Report Hadoop Map Reduce
Urvashi Kataria
 
Cascading User Group Meet
Vinoth Kannan
 
Introduction to apache hadoop
Shashwat Shriparv
 
Mapredtutorial
Anup Mohta
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
 
Map Reduce
Prashant Gupta
 
11. From Hadoop to Spark 1:2
Fabio Fumarola
 
Introduction to MapReduce
Hassan A-j
 
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
Ad

More from Dr. C.V. Suresh Babu (20)

PPTX
Data analytics with R
Dr. C.V. Suresh Babu
 
PPTX
Association rules
Dr. C.V. Suresh Babu
 
PPTX
Clustering
Dr. C.V. Suresh Babu
 
PPTX
Classification
Dr. C.V. Suresh Babu
 
PPTX
Blue property assumptions.
Dr. C.V. Suresh Babu
 
PPTX
Introduction to regression
Dr. C.V. Suresh Babu
 
PPTX
Expert systems
Dr. C.V. Suresh Babu
 
PPTX
Dempster shafer theory
Dr. C.V. Suresh Babu
 
PPTX
Bayes network
Dr. C.V. Suresh Babu
 
PPTX
Bayes' theorem
Dr. C.V. Suresh Babu
 
PPTX
Knowledge based agents
Dr. C.V. Suresh Babu
 
PPTX
Rule based system
Dr. C.V. Suresh Babu
 
PPTX
Formal Logic in AI
Dr. C.V. Suresh Babu
 
PPTX
Production based system
Dr. C.V. Suresh Babu
 
PPTX
Game playing in AI
Dr. C.V. Suresh Babu
 
PPTX
Diagnosis test of diabetics and hypertension by AI
Dr. C.V. Suresh Babu
 
PPTX
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 
PDF
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 
Data analytics with R
Dr. C.V. Suresh Babu
 
Association rules
Dr. C.V. Suresh Babu
 
Classification
Dr. C.V. Suresh Babu
 
Blue property assumptions.
Dr. C.V. Suresh Babu
 
Introduction to regression
Dr. C.V. Suresh Babu
 
Expert systems
Dr. C.V. Suresh Babu
 
Dempster shafer theory
Dr. C.V. Suresh Babu
 
Bayes network
Dr. C.V. Suresh Babu
 
Bayes' theorem
Dr. C.V. Suresh Babu
 
Knowledge based agents
Dr. C.V. Suresh Babu
 
Rule based system
Dr. C.V. Suresh Babu
 
Formal Logic in AI
Dr. C.V. Suresh Babu
 
Production based system
Dr. C.V. Suresh Babu
 
Game playing in AI
Dr. C.V. Suresh Babu
 
Diagnosis test of diabetics and hypertension by AI
Dr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 
Ad

Recently uploaded (20)

PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 

Developing a Map Reduce Application

  • 1. Developing a MapReduce Application Dr. C.V. Suresh Babu (CentreforKnowledgeTransfer) institute
  • 2. Phases of Developing a MapReduce Application • Configuration API • Configuring the Development Environment • GenericOptionsParser, Tool and ToolRunner • Writing Unit Tests • Running locally and in a cluster on Test Data • The MapReduce Web UI • Hadoop Logs • Tuning a Job to improve performance (CentreforKnowledgeTransfer) institute
  • 3. Stages 1:Developing a MapReduce Application • Writing a program in MapReduce follows a certain pattern. • You start by writing your map and reduce functions, ideally with unit tests to make sure they do what you expect. • Then you write a driver program to run a job, which can run from your IDE using a small subset of the data to check that it is working. • If it fails, you can use your IDE’s debugger to find the source of the problem. • With this information, you can expand your unit tests to cover this case and improve your mapper or reducer as appropriate to handle such input correctly. (CentreforKnowledgeTransfer) institute
  • 4. Stages 2:Developing a MapReduce Application • When the program runs as expected against the small dataset, you are ready to unleash it on a cluster. • Running against the full dataset is likely to expose some more issues, which you can fix as before, by expanding your tests and mapper or reducer to handle the new cases. • Debugging failing programs in the cluster is a challenge, so we look at some common techniques to make it easier. (CentreforKnowledgeTransfer) institute
  • 5. Stage 3: Developing a MapReduce Application • After the program is working, you may wish to do some tuning, first by running through some standard checks for making MapReduce programs faster and then by doing task profiling. • Profiling distributed programs is not easy, but Hadoop has hooks to aid the process. (CentreforKnowledgeTransfer) institute
  • 6. Example: Word Count Task: Counting the word occurances (frequencies) in a text file (or set of files). <word, count >as <key, value >pair Mapper: Emits <word, 1 >for each word (no counting at this part). Shuffle in between: pairs with same keys grouped together and passed to a single machine. Reducer: Sums up the values (1s) with the same key value (CentreforKnowledgeTransfer) institute