SlideShare a Scribd company logo
Confidential, Copyright © Quanticate
Introduction to Map - Reduce
Muralidharan Deenathayalan
Technical Lead
Muralidharan.deenathayalan@quanticate.com
Apache logo are trademarks of The Apache Software Foundation.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Confidential, Copyright © Quanticate
Agenda
What is Map-Reduce?
Map-Reduce architecture
Advantages of Map-Reduce
Frameworks available for writing Map-Reduce?
WordCount – Map-Reduce Program explained
How to compile Map-Reduce program using Eclipse?
How to deploy Map-Reduce program?
How to run Map-Reduce program?
Q & A
Confidential, Copyright © Quanticate
Who Am I ?
7+ years of experience in Microsoft technologies like Asp.net, C#,
SQL server and SharePoint
2+ years of experience in open source technologies like Java, Alfresco and Apache
Cassandra
Author of Apache Cassandra Cookbook (In writing )
Csharpcorner MVP
Frequent blogger
Confidential, Copyright © Quanticate
What is Map-Reduce?
 Generally called as Map-R program
 MapReduce Map() + Reduce()
 MapReduce is a programming approach to process large datasets in parallel, distributed on a
cluster ( Divide and conquer).
Map
Confidential, Copyright © Quanticate
What is Map-Reduce?
• Map:
– Receives input key/value pair
– Outputs intermediate key/value pair
• Reduce :
– Receives intermediate key/value pair
– Outputs key/value pair
Input Data
Map
Reduce
Reduce
Map
Map
Input Data
Confidential, Copyright © Quanticate
Map-Reduce Architecture overview
Job trackerJob tracker
Task tracker
Task tracker
Task tracker
Master node
Slave node 1 Slave node 2 Slave node N
Workers
user
Workers Workers
Confidential, Copyright © Quanticate
Advantages of Map-Reduce
 Distributed pattern-based searching
 Distributed sorting
 Web access logs
 Machine Learning
Confidential, Copyright © Quanticate
Framework available for writing
Map-Reduce
Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html
JAVA
Cascading
Crunch
CLOJURE
Cascalog
SCALA
Scrunch
Scalding
Scoobi
R
Rhadoop
MICROSOFT
.Net (C# / VB.net)
SPECIAL (HIGH-LEVEL)
Apache Hive
Apache Pig
RUBY
Wukong
Cascading Jruby
PYTHON
MR Job
Dumbo
Hadooppy
Pydoop
Luigi
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text,
Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
} } }
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
} }
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf); }
Confidential, Copyright © Quanticate
How to compile Map-Reduce
program using Eclipse?
 Refer Hadoop jar file from your disk
 Maven is simple to use
 Eclipse  Project  Build Project
 No errors in the eclipse console 
Confidential, Copyright © Quanticate
How to deploy Map-Reduce program?
Confidential, Copyright © Quanticate
How to run Map-Reduce program?
Confidential, Copyright © Quanticate
Summary
 What is Map-Reduce?
 Architecture of Map-Reduce?
 Advantages of Map-Reduce
 Frameworks available for Map-Reduce?
 WordCount – Map-Reduce Program explained
 Compiling WordCount Map-Reduce program using Eclipse
 Deploying Map-Reduce program
 Executing a Map-Reduce program
Confidential, Copyright © Quanticate
Q & A
Confidential, Copyright © Quanticate
References
http://en.wikipedia.org/wiki/MapReduce
http://hortonworks.com
http://hadoop.apache.org
Confidential, Copyright © Quanticate
Coding-Freaks.Net
www.codingfreaks.net
Quanticate OPDev Twitter
https://twitter.com/quanticateopdev
Twitter
www.Twitter.com/muralidharand
Confidential, Copyright © Quanticate

More Related Content

What's hot (20)

PPTX
Map reduce presentation
ateeq ateeq
 
PPTX
Introduction to MapReduce
Hassan A-j
 
PPTX
MapReduce Paradigm
Dilip Reddy
 
PDF
Map Reduce
Vigen Sahakyan
 
PPT
Map Reduce
schapht
 
PDF
An Introduction to MapReduce
Frane Bandov
 
PPT
Map Reduce
Sri Prasanna
 
PPTX
Analysing of big data using map reduce
Paladion Networks
 
PDF
Mapreduce Algorithms
Amund Tveit
 
PPTX
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 
PDF
Large Scale Data Analysis with Map/Reduce, part I
Marin Dimitrov
 
PPT
An Introduction To Map-Reduce
Francisco Pérez-Sorrosal
 
PPTX
Map reduce paradigm explained
Dmytro Sandu
 
PDF
MapReduce Algorithm Design
Gabriela Agustini
 
PDF
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 
PPTX
Introduction to MapReduce
Chicago Hadoop Users Group
 
PPTX
Hadoop/MapReduce/HDFS
praveen bhat
 
PPT
Introduction To Map Reduce
rantav
 
PDF
Topic 6: MapReduce Applications
Zubair Nabi
 
PPT
Hadoop MapReduce Fundamentals
Lynn Langit
 
Map reduce presentation
ateeq ateeq
 
Introduction to MapReduce
Hassan A-j
 
MapReduce Paradigm
Dilip Reddy
 
Map Reduce
Vigen Sahakyan
 
Map Reduce
schapht
 
An Introduction to MapReduce
Frane Bandov
 
Map Reduce
Sri Prasanna
 
Analysing of big data using map reduce
Paladion Networks
 
Mapreduce Algorithms
Amund Tveit
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
 
Large Scale Data Analysis with Map/Reduce, part I
Marin Dimitrov
 
An Introduction To Map-Reduce
Francisco Pérez-Sorrosal
 
Map reduce paradigm explained
Dmytro Sandu
 
MapReduce Algorithm Design
Gabriela Agustini
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 
Introduction to MapReduce
Chicago Hadoop Users Group
 
Hadoop/MapReduce/HDFS
praveen bhat
 
Introduction To Map Reduce
rantav
 
Topic 6: MapReduce Applications
Zubair Nabi
 
Hadoop MapReduce Fundamentals
Lynn Langit
 

Similar to Map Reduce introduction (20)

PPT
Behm Shah Pagerank
gothicane
 
PPTX
Hadoop ecosystem
Ran Silberman
 
PPTX
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
PPTX
Map Reduce
Prashant Gupta
 
PDF
Hadoop ecosystem
Ran Silberman
 
PPTX
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
PPTX
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
EMC
 
PDF
MapReduce basics
Harisankar H
 
PPTX
MapReduce wordcount program
Sarwan Singh
 
PDF
Spark what's new what's coming
Databricks
 
PPT
Big-data-analysis-training-in-mumbai
Unmesh Baile
 
PPTX
Intro to Apache Spark by Marco Vasquez
MapR Technologies
 
PPTX
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
ODP
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
Kumari Surabhi
 
PPT
Taste Java In The Clouds
Jacky Chu
 
PDF
Mapredtutorial
Anup Mohta
 
PDF
Running Cognos on Hadoop
Senturus
 
PPTX
Cs267 hadoop programming
Kuldeep Dhole
 
PPTX
Dart and Flutter Basics.pptx
DSCVSSUT
 
PPT
MapReduce in cgrid and cloud computinge.ppt
gvlbcy
 
Behm Shah Pagerank
gothicane
 
Hadoop ecosystem
Ran Silberman
 
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
Map Reduce
Prashant Gupta
 
Hadoop ecosystem
Ran Silberman
 
Introduction to Map-Reduce Programming with Hadoop
Dilum Bandara
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
EMC
 
MapReduce basics
Harisankar H
 
MapReduce wordcount program
Sarwan Singh
 
Spark what's new what's coming
Databricks
 
Big-data-analysis-training-in-mumbai
Unmesh Baile
 
Intro to Apache Spark by Marco Vasquez
MapR Technologies
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
Kumari Surabhi
 
Taste Java In The Clouds
Jacky Chu
 
Mapredtutorial
Anup Mohta
 
Running Cognos on Hadoop
Senturus
 
Cs267 hadoop programming
Kuldeep Dhole
 
Dart and Flutter Basics.pptx
DSCVSSUT
 
MapReduce in cgrid and cloud computinge.ppt
gvlbcy
 
Ad

More from Muralidharan Deenathayalan (10)

PPTX
What's new in C# 8.0 (beta)
Muralidharan Deenathayalan
 
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
PPT
Alfresco 5.0 features
Muralidharan Deenathayalan
 
PPT
Test drive on driven development process
Muralidharan Deenathayalan
 
PPT
Apache Hive - Introduction
Muralidharan Deenathayalan
 
PPT
Apache cassandra
Muralidharan Deenathayalan
 
PPT
Alfresco share 4.1 to 4.2 customisation
Muralidharan Deenathayalan
 
PPT
Introduction about Alfresco webscript
Muralidharan Deenathayalan
 
PPT
Alfresco activiti workflows
Muralidharan Deenathayalan
 
PPT
Alfresco content model
Muralidharan Deenathayalan
 
What's new in C# 8.0 (beta)
Muralidharan Deenathayalan
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
Alfresco 5.0 features
Muralidharan Deenathayalan
 
Test drive on driven development process
Muralidharan Deenathayalan
 
Apache Hive - Introduction
Muralidharan Deenathayalan
 
Apache cassandra
Muralidharan Deenathayalan
 
Alfresco share 4.1 to 4.2 customisation
Muralidharan Deenathayalan
 
Introduction about Alfresco webscript
Muralidharan Deenathayalan
 
Alfresco activiti workflows
Muralidharan Deenathayalan
 
Alfresco content model
Muralidharan Deenathayalan
 
Ad

Recently uploaded (20)

PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Python basic programing language for automation
DanialHabibi2
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 

Map Reduce introduction

  • 1. Confidential, Copyright © Quanticate Introduction to Map - Reduce Muralidharan Deenathayalan Technical Lead [email protected] Apache logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2. Confidential, Copyright © Quanticate Agenda What is Map-Reduce? Map-Reduce architecture Advantages of Map-Reduce Frameworks available for writing Map-Reduce? WordCount – Map-Reduce Program explained How to compile Map-Reduce program using Eclipse? How to deploy Map-Reduce program? How to run Map-Reduce program? Q & A
  • 3. Confidential, Copyright © Quanticate Who Am I ? 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra Author of Apache Cassandra Cookbook (In writing ) Csharpcorner MVP Frequent blogger
  • 4. Confidential, Copyright © Quanticate What is Map-Reduce?  Generally called as Map-R program  MapReduce Map() + Reduce()  MapReduce is a programming approach to process large datasets in parallel, distributed on a cluster ( Divide and conquer). Map
  • 5. Confidential, Copyright © Quanticate What is Map-Reduce? • Map: – Receives input key/value pair – Outputs intermediate key/value pair • Reduce : – Receives intermediate key/value pair – Outputs key/value pair Input Data Map Reduce Reduce Map Map Input Data
  • 6. Confidential, Copyright © Quanticate Map-Reduce Architecture overview Job trackerJob tracker Task tracker Task tracker Task tracker Master node Slave node 1 Slave node 2 Slave node N Workers user Workers Workers
  • 7. Confidential, Copyright © Quanticate Advantages of Map-Reduce  Distributed pattern-based searching  Distributed sorting  Web access logs  Machine Learning
  • 8. Confidential, Copyright © Quanticate Framework available for writing Map-Reduce Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html JAVA Cascading Crunch CLOJURE Cascalog SCALA Scrunch Scalding Scoobi R Rhadoop MICROSOFT .Net (C# / VB.net) SPECIAL (HIGH-LEVEL) Apache Hive Apache Pig RUBY Wukong Cascading Jruby PYTHON MR Job Dumbo Hadooppy Pydoop Luigi
  • 9. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }
  • 10. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
  • 11. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }
  • 12. Confidential, Copyright © Quanticate How to compile Map-Reduce program using Eclipse?  Refer Hadoop jar file from your disk  Maven is simple to use  Eclipse  Project  Build Project  No errors in the eclipse console 
  • 13. Confidential, Copyright © Quanticate How to deploy Map-Reduce program?
  • 14. Confidential, Copyright © Quanticate How to run Map-Reduce program?
  • 15. Confidential, Copyright © Quanticate Summary  What is Map-Reduce?  Architecture of Map-Reduce?  Advantages of Map-Reduce  Frameworks available for Map-Reduce?  WordCount – Map-Reduce Program explained  Compiling WordCount Map-Reduce program using Eclipse  Deploying Map-Reduce program  Executing a Map-Reduce program
  • 16. Confidential, Copyright © Quanticate Q & A
  • 17. Confidential, Copyright © Quanticate References http://en.wikipedia.org/wiki/MapReduce http://hortonworks.com http://hadoop.apache.org
  • 18. Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand