SlideShare a Scribd company logo
Cloud Computing
MapReduce in Heterogeneous
Environments
Eva Kalyvianaki
ek264@cam.ac.uk
Contents
 Looking at MapReduce performance in heterogeneous
clusters
 Material is from the paper:
“Improving MapReduce Performance in Heterogeneous Environments”,
By Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and
Ion Stoica, published in Usenix OSDI conference, 2008
 and their presentation at OSDI
2
Motivation: MapReduce is becoming popular
 Open-source implementation, Hadoop, used by Yahoo!,
Facebook, Last.fm, …
 Scale: 20 PB/day at Google, O(10,000) nodes at Yahoo, 3000
jobs/day at Facebook
3
Stragglers in MapReduce
 Straggler is a node that performs poorly or not performing
at all.
 Original MapReduce mitigation approach was:
 To run a speculative copy (called a backup task)
 Whichever copy or original would finish first would be included
 Without speculative execution, a job would be slow as the
slowest sub-task
 Google notes that speculative execution can improve job
response times by 44%
 Is this approach good enough for modern clusters?
4
Modern Clusters: Heterogeneity is the norm
 Cloud computing providers like Amazon’s Elastic Compute
Cloud (EC2) provide cheap on-demand computing:
 Price: 2 cents / VM / hour
 Scale: thousands of VMs
 Caveat: less control of performance
 Main challenge for Hadoop on EC2 is performance
heterogeneity, which breaks task scheduler assumptions
 This lecture/paper is on a new LATE scheduler that can cut
response time in half
5
MapReduce Revised
6
MapReduce Implementation, Hadoop
7
Scheduling in MapReduce
 When a node has an empty slot, Hadoop chooses one from
the three categories in the following priority:
1. A failed task is given higher priority
2. Unscheduled tasks. For maps, tasks with local data to the node are
chosen first.
3. Looks to run a speculative task.
8
Deciding on Speculative Tasks
 Which task to execute speculatively?
 Hadoop monitors tasks progress using a progress score: a
number from 0, …, 1
 For mappers: the score is the fraction of input data read
 For reducers: the execution is divided into three equal phases,
1/3 of the score each:
 Copy phase: percent of maps that output has been copied from
 Sort phase: map outputs are sorted by key: percent of data merged
 Reduce phase: percent of data passed through the reduce function
 Example: a task halfway through the copy phase has
progress score = 1/2*1/3 = 1/6.
 Example: a task halfway through the reduce phase has
progress score = 1/3 + 1/3 + 1/2 * 1/3 = 5/6
9
Deciding on Speculative Tasks (con’t)
 Hadoop looks at the average progress of each category of
maps and reduces and defines a threshold:
 When a task’s progress is less than the average for its
category minus 0.2, and the task has run at least one
minute, it is marked as a straggler:
threshold = avgProgress – 0.2
 All tasks with progress score < threshold are stragglers
 Ties are broken by data locality
 This approach works reasonably well in homogeneous clusters
10
Scheduler’s Assumptions
1. Nodes can perform work at roughly the same rate
2. Tasks progress at constant rate all the time
3. There is no cost to starting a speculative task
4. A task’s progress is roughly equal to the fraction of its total
work
5. Tasks tend to finish in waves, so a task with a low progress
score is likely a slow task
6. Different task of the same category (maps or reduces) take
roughly the same amount of work
11
Revising Scheduler’s Assumptions
1. Nodes can perform work at roughly the same rate
2. Tasks progress at constant rate all the time
 (1) In heterogeneous clusters some nodes are slower (older)
than others
 (2) Virtualized clusters “suffer” from co-location interference
12
Heterogeneity in Virtualized Environments
 VM technology isolates CPU and memory, but disk and
network are shared
 Full bandwidth when no contention
 Equal shares when there is contention
 2.5x performance difference
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7
IO
Performance
per
VM
(MB/s)
VMs on Physical Host
13
Revising Scheduler’s Assumptions
3. There is no cost to starting a speculative task
4. A task’s progress is roughly equal to the fraction of its total
work
5. Tasks tend to finish in waves, so a task with a low progress
score is likely a slow task
 (3) Too many speculative tasks can take away resources
from other running tasks
 (4) The copy phase of reducers is the slowest part, because
it involves all-pairs communications. But this phase counts
for 1/3 of the total reduce work.
 (5) Tasks from different generations will be executed
concurrently. So newer faster tasks are considered with older
show tasks, avgProgress changes a lot.
14
Idea: Progress Rates
 Instead of using progress score values, compute progress
rates, and back up tasks that are “far enough” below the
mean
 Problem: can still select the wrong tasks
15
Progress Rate Example
Time (min)
Node 1
Node 2
Node 3
3x slower
1.9x slower
1 task/min
1 min 2 min
16
Progress Rate Example
Node 1
Node 2
Node 3
What if the job had 5 tasks?
time left: 1.8 min
2 min
Time (min)
Node 2 is slowest, but should back up Node 3’s
task!
time left: 1 min
17
Our Scheduler: LATE
 Insight: back up the task with the largest estimated finish
time
 “Longest Approximate Time to End”  LATE
 Look forward instead of looking backward
 Sanity thresholds:
 Cap number of backup tasks
 Launch backups on fast nodes
 Only back up tasks that are sufficiently slow
18
LATE Details
 Estimating finish times:
progress score
execution time
progress rate =
1 – progress score
progress rate
estimated time left =
19
LATE Scheduler
 If a task slot becomes available and there are less than
SpeculativeCap tasks running, then:
1. Ignore the request if the node’s total progress is below
SlowNodeThreshold (=25th percentile)
2. Rank currently running, non-speculatively executed tasks by
estimated time left
3. Launch a copy of the highest-ranked task with progress rate below
SlowTaskThreshold (=25th percentile)
 Threshold values:
 10% cap on backups, 25th percentiles for slow node/task
 Validated by sensitivity analysis
20
LATE Example
Node 1
Node 2
Node 3
2 min
Time (min)
Progress = 5.3%
Estimated time left:
(1-0.66) / (1/3) = 1
Estimated time left:
(1-0.05) / (1/1.9) = 1.8
Progress = 66%
LATE correctly picks Node 3
21
Evaluation
 Environments:
 EC2 (3 job types, 200-250 nodes)
 Small local testbed
 Self-contention through VM placement
 Stragglers through background processes
22
EC2 Sort without Stragglers (Sec 5.2.1)
 106 machines , 7-8 VMs per machine  total of 243 VMs
 128 MB data per host, 30 GB in total
 486 map tasks and 437 reduce tasks
 average 27% speedup over native, 31% over no backups
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Worst Best Average
Normalized
Response
Time
No Backups
Hadoop Native
LATE Scheduler
23
EC2 Sort with Stragglers (Sec 5.2.2)
 8 VMs are manually slowed down out of 100 VMs in total
 running background of CPU- and disk-intensive jobs
 average 58% speedup over native, 220% over no backups
 93% max speedup over native
0.0
0.5
1.0
1.5
2.0
2.5
Worst Best Average
Normalized
Response
Time
No Backups
Hadoop Native
LATE Scheduler
24
Conclusion
 Heterogeneity is a challenge for parallel apps, and is
growing more important
 Lessons:
 Back up tasks which hurt response time most
 2x improvement using simple algorithm
25
Summary
 MapReduce is a very powerful and expressive model
 Performance depends a lot on implementation details
 Material is from the paper:
“Improving MapReduce Performance in Heterogeneous Environments”,
By Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and
Ion Stoica, published in Usenix OSDI conference, 2008
 and their presentation at OSDI
26

More Related Content

PPSX
MapReduce Scheduling Algorithms
Leila panahi
 
PDF
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
IRJET Journal
 
PDF
Earlier stage for straggler detection and handling using combined CPU test an...
IJECEIAES
 
PPTX
Cloud schedulers and Scheduling in Hadoop
Pallav Jha
 
PDF
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal
 
PPTX
MapReduce presentation
Vu Thi Trang
 
PPTX
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
James McGalliard
 
PPTX
Introduction to map reduce
M Baddar
 
MapReduce Scheduling Algorithms
Leila panahi
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
IRJET Journal
 
Earlier stage for straggler detection and handling using combined CPU test an...
IJECEIAES
 
Cloud schedulers and Scheduling in Hadoop
Pallav Jha
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal
 
MapReduce presentation
Vu Thi Trang
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
James McGalliard
 
Introduction to map reduce
M Baddar
 

Similar to mapreduce-advanced.pptx (20)

PPT
Hw09 Fingerpointing Sourcing Performance Issues
Cloudera, Inc.
 
PDF
Scaling Hadoop at LinkedIn
DataWorks Summit
 
PPTX
Managing growth in Production Hadoop Deployments
DataWorks Summit
 
PDF
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
PPTX
Hui 3.0
Arulkumar Arumugam
 
PPTX
Collection of Small Tips on Further Stabilizing your Hadoop Cluster
DataWorks Summit
 
PDF
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
iosrjce
 
PDF
L017656475
IOSR Journals
 
PDF
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
 
PDF
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
PDF
Resource Aware Scheduling for Hadoop [Final Presentation]
Lu Wei
 
PPT
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
PDF
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
PPTX
Mapreduce is for Hadoop Ecosystem in Data Science
DakshGoti2
 
PPT
Hadoop and Mapreduce Introduction
rajsandhu1989
 
PPT
Suggested Algorithm to improve Hadoop's performance.
Meshal Albeedhani
 
PPT
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
PDF
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET Journal
 
PDF
Hadoop scheduler with deadline constraint
ijccsa
 
PDF
Survey on Job Schedulers in Hadoop Cluster
IOSR Journals
 
Hw09 Fingerpointing Sourcing Performance Issues
Cloudera, Inc.
 
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Managing growth in Production Hadoop Deployments
DataWorks Summit
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Collection of Small Tips on Further Stabilizing your Hadoop Cluster
DataWorks Summit
 
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
iosrjce
 
L017656475
IOSR Journals
 
Distributed Computing with Apache Hadoop: Technology Overview
Konstantin V. Shvachko
 
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Lu Wei
 
Hadoop - Introduction to HDFS
Vibrant Technologies & Computers
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
Mapreduce is for Hadoop Ecosystem in Data Science
DakshGoti2
 
Hadoop and Mapreduce Introduction
rajsandhu1989
 
Suggested Algorithm to improve Hadoop's performance.
Meshal Albeedhani
 
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET Journal
 
Hadoop scheduler with deadline constraint
ijccsa
 
Survey on Job Schedulers in Hadoop Cluster
IOSR Journals
 
Ad

More from ShimoFcis (11)

PDF
Motif Finding.pdf
ShimoFcis
 
PPT
05_SQA_Overview.ppt
ShimoFcis
 
PPTX
Topic21 Elect. Codebook, Cipher Block Chaining.pptx
ShimoFcis
 
PDF
4-DES.pdf
ShimoFcis
 
PPTX
lab-8 (1).pptx
ShimoFcis
 
PPTX
Lab-11-C-Problems.pptx
ShimoFcis
 
PPTX
Mid-Term Problem Solving Part.pptx
ShimoFcis
 
PPTX
Lecture 6.pptx
ShimoFcis
 
PPTX
Lecture Cloud Security.pptx
ShimoFcis
 
PPTX
mapreduce.pptx
ShimoFcis
 
PPTX
storage-systems.pptx
ShimoFcis
 
Motif Finding.pdf
ShimoFcis
 
05_SQA_Overview.ppt
ShimoFcis
 
Topic21 Elect. Codebook, Cipher Block Chaining.pptx
ShimoFcis
 
4-DES.pdf
ShimoFcis
 
lab-8 (1).pptx
ShimoFcis
 
Lab-11-C-Problems.pptx
ShimoFcis
 
Mid-Term Problem Solving Part.pptx
ShimoFcis
 
Lecture 6.pptx
ShimoFcis
 
Lecture Cloud Security.pptx
ShimoFcis
 
mapreduce.pptx
ShimoFcis
 
storage-systems.pptx
ShimoFcis
 
Ad

Recently uploaded (20)

PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
Presentation about variables and constant.pptx
safalsingh810
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 

mapreduce-advanced.pptx

  • 1. Cloud Computing MapReduce in Heterogeneous Environments Eva Kalyvianaki [email protected]
  • 2. Contents  Looking at MapReduce performance in heterogeneous clusters  Material is from the paper: “Improving MapReduce Performance in Heterogeneous Environments”, By Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica, published in Usenix OSDI conference, 2008  and their presentation at OSDI 2
  • 3. Motivation: MapReduce is becoming popular  Open-source implementation, Hadoop, used by Yahoo!, Facebook, Last.fm, …  Scale: 20 PB/day at Google, O(10,000) nodes at Yahoo, 3000 jobs/day at Facebook 3
  • 4. Stragglers in MapReduce  Straggler is a node that performs poorly or not performing at all.  Original MapReduce mitigation approach was:  To run a speculative copy (called a backup task)  Whichever copy or original would finish first would be included  Without speculative execution, a job would be slow as the slowest sub-task  Google notes that speculative execution can improve job response times by 44%  Is this approach good enough for modern clusters? 4
  • 5. Modern Clusters: Heterogeneity is the norm  Cloud computing providers like Amazon’s Elastic Compute Cloud (EC2) provide cheap on-demand computing:  Price: 2 cents / VM / hour  Scale: thousands of VMs  Caveat: less control of performance  Main challenge for Hadoop on EC2 is performance heterogeneity, which breaks task scheduler assumptions  This lecture/paper is on a new LATE scheduler that can cut response time in half 5
  • 8. Scheduling in MapReduce  When a node has an empty slot, Hadoop chooses one from the three categories in the following priority: 1. A failed task is given higher priority 2. Unscheduled tasks. For maps, tasks with local data to the node are chosen first. 3. Looks to run a speculative task. 8
  • 9. Deciding on Speculative Tasks  Which task to execute speculatively?  Hadoop monitors tasks progress using a progress score: a number from 0, …, 1  For mappers: the score is the fraction of input data read  For reducers: the execution is divided into three equal phases, 1/3 of the score each:  Copy phase: percent of maps that output has been copied from  Sort phase: map outputs are sorted by key: percent of data merged  Reduce phase: percent of data passed through the reduce function  Example: a task halfway through the copy phase has progress score = 1/2*1/3 = 1/6.  Example: a task halfway through the reduce phase has progress score = 1/3 + 1/3 + 1/2 * 1/3 = 5/6 9
  • 10. Deciding on Speculative Tasks (con’t)  Hadoop looks at the average progress of each category of maps and reduces and defines a threshold:  When a task’s progress is less than the average for its category minus 0.2, and the task has run at least one minute, it is marked as a straggler: threshold = avgProgress – 0.2  All tasks with progress score < threshold are stragglers  Ties are broken by data locality  This approach works reasonably well in homogeneous clusters 10
  • 11. Scheduler’s Assumptions 1. Nodes can perform work at roughly the same rate 2. Tasks progress at constant rate all the time 3. There is no cost to starting a speculative task 4. A task’s progress is roughly equal to the fraction of its total work 5. Tasks tend to finish in waves, so a task with a low progress score is likely a slow task 6. Different task of the same category (maps or reduces) take roughly the same amount of work 11
  • 12. Revising Scheduler’s Assumptions 1. Nodes can perform work at roughly the same rate 2. Tasks progress at constant rate all the time  (1) In heterogeneous clusters some nodes are slower (older) than others  (2) Virtualized clusters “suffer” from co-location interference 12
  • 13. Heterogeneity in Virtualized Environments  VM technology isolates CPU and memory, but disk and network are shared  Full bandwidth when no contention  Equal shares when there is contention  2.5x performance difference 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 IO Performance per VM (MB/s) VMs on Physical Host 13
  • 14. Revising Scheduler’s Assumptions 3. There is no cost to starting a speculative task 4. A task’s progress is roughly equal to the fraction of its total work 5. Tasks tend to finish in waves, so a task with a low progress score is likely a slow task  (3) Too many speculative tasks can take away resources from other running tasks  (4) The copy phase of reducers is the slowest part, because it involves all-pairs communications. But this phase counts for 1/3 of the total reduce work.  (5) Tasks from different generations will be executed concurrently. So newer faster tasks are considered with older show tasks, avgProgress changes a lot. 14
  • 15. Idea: Progress Rates  Instead of using progress score values, compute progress rates, and back up tasks that are “far enough” below the mean  Problem: can still select the wrong tasks 15
  • 16. Progress Rate Example Time (min) Node 1 Node 2 Node 3 3x slower 1.9x slower 1 task/min 1 min 2 min 16
  • 17. Progress Rate Example Node 1 Node 2 Node 3 What if the job had 5 tasks? time left: 1.8 min 2 min Time (min) Node 2 is slowest, but should back up Node 3’s task! time left: 1 min 17
  • 18. Our Scheduler: LATE  Insight: back up the task with the largest estimated finish time  “Longest Approximate Time to End”  LATE  Look forward instead of looking backward  Sanity thresholds:  Cap number of backup tasks  Launch backups on fast nodes  Only back up tasks that are sufficiently slow 18
  • 19. LATE Details  Estimating finish times: progress score execution time progress rate = 1 – progress score progress rate estimated time left = 19
  • 20. LATE Scheduler  If a task slot becomes available and there are less than SpeculativeCap tasks running, then: 1. Ignore the request if the node’s total progress is below SlowNodeThreshold (=25th percentile) 2. Rank currently running, non-speculatively executed tasks by estimated time left 3. Launch a copy of the highest-ranked task with progress rate below SlowTaskThreshold (=25th percentile)  Threshold values:  10% cap on backups, 25th percentiles for slow node/task  Validated by sensitivity analysis 20
  • 21. LATE Example Node 1 Node 2 Node 3 2 min Time (min) Progress = 5.3% Estimated time left: (1-0.66) / (1/3) = 1 Estimated time left: (1-0.05) / (1/1.9) = 1.8 Progress = 66% LATE correctly picks Node 3 21
  • 22. Evaluation  Environments:  EC2 (3 job types, 200-250 nodes)  Small local testbed  Self-contention through VM placement  Stragglers through background processes 22
  • 23. EC2 Sort without Stragglers (Sec 5.2.1)  106 machines , 7-8 VMs per machine  total of 243 VMs  128 MB data per host, 30 GB in total  486 map tasks and 437 reduce tasks  average 27% speedup over native, 31% over no backups 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Worst Best Average Normalized Response Time No Backups Hadoop Native LATE Scheduler 23
  • 24. EC2 Sort with Stragglers (Sec 5.2.2)  8 VMs are manually slowed down out of 100 VMs in total  running background of CPU- and disk-intensive jobs  average 58% speedup over native, 220% over no backups  93% max speedup over native 0.0 0.5 1.0 1.5 2.0 2.5 Worst Best Average Normalized Response Time No Backups Hadoop Native LATE Scheduler 24
  • 25. Conclusion  Heterogeneity is a challenge for parallel apps, and is growing more important  Lessons:  Back up tasks which hurt response time most  2x improvement using simple algorithm 25
  • 26. Summary  MapReduce is a very powerful and expressive model  Performance depends a lot on implementation details  Material is from the paper: “Improving MapReduce Performance in Heterogeneous Environments”, By Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica, published in Usenix OSDI conference, 2008  and their presentation at OSDI 26