SlideShare a Scribd company logo
Scheduling In Distributed Systems
          Candidacy exam


                              Andrii Vozniuk
                              EPFL
                              July 4, 2012
Big Data
       Data explosion
       Processing gets more complicated




          Generates: 25 TB/day       Generates: 40 TB/day
          Stores:    10 PB/year      Stores:    20 PB/year

            Resources of many computers should be used
    2
Typical Data Processing Pipeline


                     Log              Sensor
                     data              data


ETL-like batch      Clean            Analyze        Using resources of
 processing         data              data          many organizations

                                  Particle found!
Efficient query     Query
  execution         data


                  User model

           No one-size-fits-all system currently exists
 3
Outline
    Ɣ Gamma - parallel database
        MapReduce - data-intensive system

        Condor - compute-intensive system

 Conclusions
 Future Research




4
Scheduling In Distributed Systems
       Scheduling
           Policy: setting an ordering of tasks                            task
                                                       task
           Assigning resources to tasks
                                                       task
                                                       task


                                        How to match resources and tasks?




              Scheduling is challenging in distributed systems
    5
Matching Tasks With Resources
       Perspectives
           Data model
           Execution model


             System/Perspecti   Data model      Execution model
             ve
             Gamma              Relational      Multioperator
             MapReduce          Unconstrained   MapReduce
             Condor             Unconstrained   Unconstrained




            How scheduling is influenced by data and execution
    6                             models?
Gamma                                                Ɣ
       Pioneering parallel database
       Data model: constrained
           Relational data model
           Relations are horizontally partitioned
       Execution model: constrained
           Multioperator queries
           Operators employ hash-based algorithms




    7
Gamma: Scheduler                                                         Ɣ
SELECT r FROM R      Query                                   Host
WHERE r < ‘k’ query Manager          Catalog
                                                             Machine

                                                             Gamma
       Optimizes query                                Schedules
                                  Scheduler                   Database
       Compiles plan                                  operators
                                   Process


                          Operator            Operator
               Node 1     Process              Process        Node 2
         Execution on
         relevant nodes     a-m                 n-z



          Scheduling is done at the operator level
 8
Gamma: Batch Scheduling                                           Ɣ
       Exploit sharing by scheduling in a batch
       Example of selection sharing


                σ1      σ2            σ1       σ2
                                                    Shared scan

                A       A                  A



       Reads of A can be shared applying predicates in turn
       Shared relation A is scanned only once


              Batch scheduling trades latency for throughput
    9
Gamma: Batch Scheduling Joins                                           Ɣ
    Several hash-joins in a batch of queries
    Hash table for the same relation can be shared
    Example assumes 100% selectivity of σ
                                                      Shared hash-table for A


             ⋈            ⋈                   ⋈        ⋈

         σ       σ    σ       σ           σ       σ     σ

         A       Β    A       C           B       A     C


    Sharing reduces I/O and memory usage

             Sharing among joins reduces total execution time
    10
Limitations Of Gamma                                           Ɣ
    Gamma offers
        Efficient query execution
        Sharing in a batch of queries
    Gamma operates on structured data
    Gamma is not suitable for
        Unstructured data processing
        ETL type of workload
        Running on large scale




             A different system for ETL processing is needed
    11
MapReduce
    System for data-intensive applications
    Execution model: constrained
        Job is a set of map and reduce tasks
        Tasks are independent
    Data model: unconstrained
        Arbitrary data format
        Files are partitioned into chunks
        Each chunk is replicated several times




    12
MapReduce: Scheduling
                                    Map
                                    Reduc             Map
                                     1e                2
          Example:
                          Chunk1            Chunk2
         MapReduce job
                          Result1
                          Temp1             Temp2
         4 Map tasks

         2 Reduce task              Map               Reduc
                                                      Map
                                     3                 4e
                          Chunk3            Chunk4
                          Temp3             Result2
                                            Temp4
    Tasks are scheduled close to data
    Execution is scalable and fault-tolerant
    Execution is elastic
           Fine grain scheduling improves fault tolerance and
    13                          elasticity
MapReduce: Speculative Execution
    Nodes may become slow
    Speculative execution minimizes job’s response time
    Launch if progress is 20% less than average
                                        backup
          Normal node


                            straggler

Temporary slow node




         Speculative execution works well in homogeneous
    14                     environment
Emerging Heterogeneous Infrastructures
    Replacement of failed components
    Extending existing cluster with new machines
    Virtualized data centers of cloud providers
        CPU and RAM are isolated
        Contention for disk and network
              IO Performance per




                                   60
                  VM (MB/s)




                                   40

                                   20

                                   0
                                        1   2     3      4      5      6   7
                                                VMs on Physical Host

In many real-life cases the infrastructure is heterogeneous
    15
MapReduce: Heterogeneous Cluster
    Fast node




Slow node



    Performance degrades on heterogeneous cluster
        Slow nodes are wasted
        Backup tasks on slow nodes
        All straggling tasks are treated equally
        Thrashing due to excessive speculative execution

     Speculative execution should be improved for heterogeneous
    16                         cluster
MapReduce: LATE Scheduler
    Idea: back up the task with the largest estimated finish
     time (Longest Approximate Time to End)
                                          progress score
                      progress rate =
                                          execution time

                                         1 – progress score
                estimated time left =
                                           progress rate
    Thresholds
        Limit the number of backup tasks
        Launch backup tasks on fast nodes
        Backup only sufficiently slow tasks
         LATE looks forward to prioritize tasks to speculate
    17
MapReduce: LATE Example
   Back up the task with Longest Approximate Time to End
                                   2 min

1                                                 Estimated time left:
                                                  (1-0.66) / (1/3) = 1
     1 task/min

2                 Progress = 66%
                                                  Estimated time left:
                                                  (1-0.05) / (1/1.9) = 1.8
     3x slower
                            Progress = 5.3%
3
    1.9x slower


                               Time (min)     improvement

LATE correctly identifies task which hurts the response time the
18                             most
Limitations Of MapReduce
    MapReduce offers
        High scalability
        Good fault tolerance
        Handling of unstructured data
    MapReduce is not suitable for
        Running on multi organization infrastructure
        Harvesting idle resources in organization




     A different system for multi organization infrastructure is
    19                       needed
Condor
    Compute-intensive system harvesting idle resources
    Data model: arbitrary
    Execution model: arbitrary
                           How to increase utilization
                           and respect the owners?




                                          job

                                          job
                                                              job
                                          job
       Increase resources utilization by scheduling jobs on idle
    20                         machines
Condor Scheduler: Centralized?
                         Scheduler




                                     job

                                     job
                                                       job
                                     job
     Efficient but not reliable, possible bottleneck
21
Condor Scheduler: Distributed?
                                            Scheduler


     Scheduler




                                            Scheduler

                       Scheduler



                                   job

                                   job
                                                 job
                                   job
                 Reliable but inefficient
22
Condor Scheduler: Hybrid!

Information about tasks            Matchmaker           Information about nodes

      Scheduler           1
                              3                          1
                                          1
                                                    2
                                              3                     Scheduler

                              Scheduler


                              4
                                                  job

                                                  job
                                                                          job
                                                  job
            Hybrid approach has the best of both worlds
 23
ClassAds: Describing Jobs and Resources
          Job Description          Machine Description

          [MyType=“Job”            [MyType=“Machine“
          TargetType = “Machine“   TargetType=“Job“
          Department=“CompSci“     Machine=“nostos.cs.wisc.edu“
          Requirements =           OpSys=“LINUX“
          (other.OpSys==LINUX &&   Disk=3076077
          other.Disk > 10000000)   Requirement = (LoadAvg <= 0.3) &&
          Rank=Memory]             (KeyboardIdle > (15*60))
                                   Rank =
                                   other.Department==self.Department]
    Requirements should be satisfied
    Candidate with the highest rank is returned
         Matchmaker is suitable for heterogeneous shared clusters
    24
Conclusions
    Scheduling done at different levels
        Gamma: operator level scheduling enables sharing
        MR and Condor: arbitrary code => sharing is hard
        Condor: matchmaking gives control on job placement

    Hybrid approaches are promising for big data processing
    Scheduling in heterogeneous deployments is challenging




    25
Thank you for your attention!

        Feedback & Question?
        Andrii.Vozniuk@epfl.ch




26
References
    Matchmaking: Distributed Resource Management for
     High Throughput Computing by Rajesh Raman, Miron
     Livny and Marvin Solomon.
    Batch Scheduling in Parallel Database Systems by Manish
     Mehta, Valery Soloviev and David J. DeWitt.
    Improving MapReduce performance in heterogeneous
     environments by Matei Zaharia, Andy Konwinski, Anthony
     D. Joseph, Randy Katz and Ion Stoica
    Slides 14 and 18 exploit presentation ideas from the LATE
     slides for OSDI 2008 by Matei Zaharia


    27

More Related Content

What's hot (20)

PPTX
Structure of the page table
duvvuru madhuri
 
PPTX
Semaphore
Arafat Hossan
 
PPTX
Segmentation in Operating Systems.
Muhammad SiRaj Munir
 
PPTX
Deadlocks in operating system
Sara Ali
 
PPT
Disk scheduling
NEERAJ BAGHEL
 
PPTX
Code optimization
veena venugopal
 
PPTX
Peephole Optimization
United International University
 
PPS
Virtual memory
Anuj Modi
 
PPTX
Parallel algorithms
Danish Javed
 
PPT
Transport services
Navin Kumar
 
PDF
Intermediate code generation in Compiler Design
Kuppusamy P
 
PPT
31 address binding, dynamic loading
myrajendra
 
PPTX
Process scheduling
Riya Choudhary
 
PPT
Open Addressing on Hash Tables
Nifras Ismail
 
PPT
Unit 3-pipelining &amp; vector processing
vishal choudhary
 
PPTX
Data Hazard and Solution for Data Hazard
COMSATS Institute of Information Technology
 
PPTX
Computer architecture page replacement algorithms
Mazin Alwaaly
 
PDF
Chapter 2 program-security
Vamsee Krishna Kiran
 
PPT
pipeline and vector processing
Acad
 
PDF
Computer architecture kai hwang
Sumedha
 
Structure of the page table
duvvuru madhuri
 
Semaphore
Arafat Hossan
 
Segmentation in Operating Systems.
Muhammad SiRaj Munir
 
Deadlocks in operating system
Sara Ali
 
Disk scheduling
NEERAJ BAGHEL
 
Code optimization
veena venugopal
 
Peephole Optimization
United International University
 
Virtual memory
Anuj Modi
 
Parallel algorithms
Danish Javed
 
Transport services
Navin Kumar
 
Intermediate code generation in Compiler Design
Kuppusamy P
 
31 address binding, dynamic loading
myrajendra
 
Process scheduling
Riya Choudhary
 
Open Addressing on Hash Tables
Nifras Ismail
 
Unit 3-pipelining &amp; vector processing
vishal choudhary
 
Data Hazard and Solution for Data Hazard
COMSATS Institute of Information Technology
 
Computer architecture page replacement algorithms
Mazin Alwaaly
 
Chapter 2 program-security
Vamsee Krishna Kiran
 
pipeline and vector processing
Acad
 
Computer architecture kai hwang
Sumedha
 

Similar to Scheduling in distributed systems - Andrii Vozniuk (20)

PDF
Notes on data-intensive processing with Hadoop Mapreduce
Evert Lammerts
 
PDF
Hadoop.mapreduce
Michael Hepburn
 
PDF
BACK TO THE FUTURE: DATAFLOW FINALLY COMES OF AGE from Structure 2012
Gigaom
 
PPTX
MapReduce Paradigm
Dilip Reddy
 
PPTX
MapReduce Paradigm
Dilip Reddy
 
PPTX
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Josh Patterson
 
KEY
Processing Big Data
cwensel
 
PPTX
Large scale computing with mapreduce
hansen3032
 
PDF
Data-Intensive Text Processing with MapReduce
George Ang
 
PDF
Data-Intensive Text Processing with MapReduce
George Ang
 
PDF
Introduction to Hadoop
Ovidiu Dimulescu
 
PDF
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Richard McDougall
 
PPTX
Взгляд на облака с точки зрения HPC
Olga Lavrentieva
 
PPTX
Zaharia spark-scala-days-2012
Skills Matter Talks
 
PPTX
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Big Data Spain
 
PPTX
Parallel Linear Regression in Interative Reduce and YARN
DataWorks Summit
 
PPTX
Hadoop For Enterprises
nvvrajesh
 
PDF
Hadoop: A Hands-on Introduction
Claudio Martella
 
PDF
Realtime Analytics with Hadoop and HBase
larsgeorge
 
PPTX
TASK AND DATA PARALLELISM in Computer Science pptx
dianachakauya
 
Notes on data-intensive processing with Hadoop Mapreduce
Evert Lammerts
 
Hadoop.mapreduce
Michael Hepburn
 
BACK TO THE FUTURE: DATAFLOW FINALLY COMES OF AGE from Structure 2012
Gigaom
 
MapReduce Paradigm
Dilip Reddy
 
MapReduce Paradigm
Dilip Reddy
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Josh Patterson
 
Processing Big Data
cwensel
 
Large scale computing with mapreduce
hansen3032
 
Data-Intensive Text Processing with MapReduce
George Ang
 
Data-Intensive Text Processing with MapReduce
George Ang
 
Introduction to Hadoop
Ovidiu Dimulescu
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Richard McDougall
 
Взгляд на облака с точки зрения HPC
Olga Lavrentieva
 
Zaharia spark-scala-days-2012
Skills Matter Talks
 
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Big Data Spain
 
Parallel Linear Regression in Interative Reduce and YARN
DataWorks Summit
 
Hadoop For Enterprises
nvvrajesh
 
Hadoop: A Hands-on Introduction
Claudio Martella
 
Realtime Analytics with Hadoop and HBase
larsgeorge
 
TASK AND DATA PARALLELISM in Computer Science pptx
dianachakauya
 
Ad

More from Andrii Vozniuk (11)

PDF
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Andrii Vozniuk
 
PDF
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Andrii Vozniuk
 
PDF
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Andrii Vozniuk
 
PDF
Combining content analytics and activity tracking to mine user interests and ...
Andrii Vozniuk
 
PDF
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
Andrii Vozniuk
 
PDF
Contextual learning analytics apps to create awareness in blended inquiry lea...
Andrii Vozniuk
 
PDF
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Andrii Vozniuk
 
PPTX
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Andrii Vozniuk
 
PDF
AngeLA: Putting the teacher in control of student privacy in the online class...
Andrii Vozniuk
 
PPTX
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
PDF
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Andrii Vozniuk
 
Enhancing Social Media Platforms for Educational and Humanitarian Knowledge S...
Andrii Vozniuk
 
Embedded interactive learning analytics dashboards with Elasticsearch and Kib...
Andrii Vozniuk
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Andrii Vozniuk
 
Combining content analytics and activity tracking to mine user interests and ...
Andrii Vozniuk
 
TPC-DS performance evaluation for JAQL and PIG queries - Andrii Vozniuk, Serg...
Andrii Vozniuk
 
Contextual learning analytics apps to create awareness in blended inquiry lea...
Andrii Vozniuk
 
Graspeo: a Social Media Platform for Knowledge Management in NGOs - Andrii Vo...
Andrii Vozniuk
 
Towards portable learning analytics dashboards - Andrii Vozniuk, Sten Govaert...
Andrii Vozniuk
 
AngeLA: Putting the teacher in control of student privacy in the online class...
Andrii Vozniuk
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Andrii Vozniuk
 
Ad

Recently uploaded (20)

PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
July Patch Tuesday
Ivanti
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 

Scheduling in distributed systems - Andrii Vozniuk

  • 1. Scheduling In Distributed Systems Candidacy exam  Andrii Vozniuk  EPFL  July 4, 2012
  • 2. Big Data  Data explosion  Processing gets more complicated Generates: 25 TB/day Generates: 40 TB/day Stores: 10 PB/year Stores: 20 PB/year Resources of many computers should be used 2
  • 3. Typical Data Processing Pipeline Log Sensor data data ETL-like batch Clean Analyze Using resources of processing data data many organizations Particle found! Efficient query Query execution data User model No one-size-fits-all system currently exists 3
  • 4. Outline Ɣ Gamma - parallel database MapReduce - data-intensive system Condor - compute-intensive system Conclusions Future Research 4
  • 5. Scheduling In Distributed Systems  Scheduling  Policy: setting an ordering of tasks task task  Assigning resources to tasks task task How to match resources and tasks? Scheduling is challenging in distributed systems 5
  • 6. Matching Tasks With Resources  Perspectives  Data model  Execution model System/Perspecti Data model Execution model ve Gamma Relational Multioperator MapReduce Unconstrained MapReduce Condor Unconstrained Unconstrained How scheduling is influenced by data and execution 6 models?
  • 7. Gamma Ɣ  Pioneering parallel database  Data model: constrained  Relational data model  Relations are horizontally partitioned  Execution model: constrained  Multioperator queries  Operators employ hash-based algorithms 7
  • 8. Gamma: Scheduler Ɣ SELECT r FROM R Query Host WHERE r < ‘k’ query Manager Catalog Machine Gamma Optimizes query Schedules Scheduler Database Compiles plan operators Process Operator Operator Node 1 Process Process Node 2 Execution on relevant nodes a-m n-z Scheduling is done at the operator level 8
  • 9. Gamma: Batch Scheduling Ɣ  Exploit sharing by scheduling in a batch  Example of selection sharing σ1 σ2 σ1 σ2 Shared scan A A A  Reads of A can be shared applying predicates in turn  Shared relation A is scanned only once Batch scheduling trades latency for throughput 9
  • 10. Gamma: Batch Scheduling Joins Ɣ  Several hash-joins in a batch of queries  Hash table for the same relation can be shared  Example assumes 100% selectivity of σ Shared hash-table for A ⋈ ⋈ ⋈ ⋈ σ σ σ σ σ σ σ A Β A C B A C  Sharing reduces I/O and memory usage Sharing among joins reduces total execution time 10
  • 11. Limitations Of Gamma Ɣ  Gamma offers  Efficient query execution  Sharing in a batch of queries  Gamma operates on structured data  Gamma is not suitable for  Unstructured data processing  ETL type of workload  Running on large scale A different system for ETL processing is needed 11
  • 12. MapReduce  System for data-intensive applications  Execution model: constrained  Job is a set of map and reduce tasks  Tasks are independent  Data model: unconstrained  Arbitrary data format  Files are partitioned into chunks  Each chunk is replicated several times 12
  • 13. MapReduce: Scheduling Map Reduc Map 1e 2 Example: Chunk1 Chunk2 MapReduce job Result1 Temp1 Temp2 4 Map tasks 2 Reduce task Map Reduc Map 3 4e Chunk3 Chunk4 Temp3 Result2 Temp4  Tasks are scheduled close to data  Execution is scalable and fault-tolerant  Execution is elastic Fine grain scheduling improves fault tolerance and 13 elasticity
  • 14. MapReduce: Speculative Execution  Nodes may become slow  Speculative execution minimizes job’s response time  Launch if progress is 20% less than average backup Normal node straggler Temporary slow node Speculative execution works well in homogeneous 14 environment
  • 15. Emerging Heterogeneous Infrastructures  Replacement of failed components  Extending existing cluster with new machines  Virtualized data centers of cloud providers  CPU and RAM are isolated  Contention for disk and network IO Performance per 60 VM (MB/s) 40 20 0 1 2 3 4 5 6 7 VMs on Physical Host In many real-life cases the infrastructure is heterogeneous 15
  • 16. MapReduce: Heterogeneous Cluster Fast node Slow node  Performance degrades on heterogeneous cluster  Slow nodes are wasted  Backup tasks on slow nodes  All straggling tasks are treated equally  Thrashing due to excessive speculative execution Speculative execution should be improved for heterogeneous 16 cluster
  • 17. MapReduce: LATE Scheduler  Idea: back up the task with the largest estimated finish time (Longest Approximate Time to End) progress score progress rate = execution time 1 – progress score estimated time left = progress rate  Thresholds  Limit the number of backup tasks  Launch backup tasks on fast nodes  Backup only sufficiently slow tasks LATE looks forward to prioritize tasks to speculate 17
  • 18. MapReduce: LATE Example  Back up the task with Longest Approximate Time to End 2 min 1 Estimated time left: (1-0.66) / (1/3) = 1 1 task/min 2 Progress = 66% Estimated time left: (1-0.05) / (1/1.9) = 1.8 3x slower Progress = 5.3% 3 1.9x slower Time (min) improvement LATE correctly identifies task which hurts the response time the 18 most
  • 19. Limitations Of MapReduce  MapReduce offers  High scalability  Good fault tolerance  Handling of unstructured data  MapReduce is not suitable for  Running on multi organization infrastructure  Harvesting idle resources in organization A different system for multi organization infrastructure is 19 needed
  • 20. Condor  Compute-intensive system harvesting idle resources  Data model: arbitrary  Execution model: arbitrary How to increase utilization and respect the owners? job job job job Increase resources utilization by scheduling jobs on idle 20 machines
  • 21. Condor Scheduler: Centralized? Scheduler job job job job Efficient but not reliable, possible bottleneck 21
  • 22. Condor Scheduler: Distributed? Scheduler Scheduler Scheduler Scheduler job job job job Reliable but inefficient 22
  • 23. Condor Scheduler: Hybrid! Information about tasks Matchmaker Information about nodes Scheduler 1 3 1 1 2 3 Scheduler Scheduler 4 job job job job Hybrid approach has the best of both worlds 23
  • 24. ClassAds: Describing Jobs and Resources Job Description Machine Description [MyType=“Job” [MyType=“Machine“ TargetType = “Machine“ TargetType=“Job“ Department=“CompSci“ Machine=“nostos.cs.wisc.edu“ Requirements = OpSys=“LINUX“ (other.OpSys==LINUX && Disk=3076077 other.Disk > 10000000) Requirement = (LoadAvg <= 0.3) && Rank=Memory] (KeyboardIdle > (15*60)) Rank = other.Department==self.Department]  Requirements should be satisfied  Candidate with the highest rank is returned Matchmaker is suitable for heterogeneous shared clusters 24
  • 25. Conclusions  Scheduling done at different levels  Gamma: operator level scheduling enables sharing  MR and Condor: arbitrary code => sharing is hard  Condor: matchmaking gives control on job placement  Hybrid approaches are promising for big data processing  Scheduling in heterogeneous deployments is challenging 25
  • 26. Thank you for your attention! Feedback & Question? [email protected] 26
  • 27. References  Matchmaking: Distributed Resource Management for High Throughput Computing by Rajesh Raman, Miron Livny and Marvin Solomon.  Batch Scheduling in Parallel Database Systems by Manish Mehta, Valery Soloviev and David J. DeWitt.  Improving MapReduce performance in heterogeneous environments by Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz and Ion Stoica  Slides 14 and 18 exploit presentation ideas from the LATE slides for OSDI 2008 by Matei Zaharia 27