Service Oriented Computing
Reading Assignment #2
Cloud Mirror
Mesos Cluster
Google Omega
Aris Cahyadi Risdianto
- 20132095 -
CloudMirror: Background Problem and Challenges
Cloud hosted Application Problem
 Not simple as Hadoop or Pregel
 Interactive = predictable throughput & latency
 100 msec latency increase = 1 % sales loss
(Amazon)
 Interactive workload ≥ batch workload CPU
 Oversubscribe bandwidth to guarantee
application = very expensive cost
 No bandwidth-to-vCPU ratio to guarantee the
bandwidth usage
Key Challenges
• “Easy” network abstraction model
specify bandwidth requirement
• A workload placement algorithm for
efficient resource allocation
• Scalable runtime to enforce bandwidth
guarantee and efficient usage
CloudMirror: Proposed Solutions
*) New network abstraction based on application communication structure
TAG*
(Tenant Application Graph)
Workload Placement
Algorithm Cloud Mirror
TAG Deployment
• Bandwidth allocation at DC uplink match with TAG model
requirements
• Bandwidth saving by VM collocations in the subtree
• VM Placement Algorithm to bridge the gap between high
level TAG and low level infrastructure
• Guaranteeing anti-affinity for HA and opportunistic anti-
affinity for non-HA
TAG model
• each vertex graph represent
application component/tier
• Intuitive, descriptive, efficient
and flexible
• produced by OpenStack Heat
and AWS Cloud formation
extension
CloudMirror: Simulation and Evaluation Result
Evaluation
1) Efficiency
a) Reserving Less Network Bandwidth
b) Accepting more tenant request
2) Placement ability to guarantee and improve
availability
3) Feasibility of deploying in real testbed
Result Highlight
• Benefits resource balancing as introduced in
bandwidth capacity constraint network topology
• Tenant rejection rate is less than 2.2 % and usually
because of large VM/bandwidth requirements
• Guaranteeing High Availability with higher WCS
requirement will increase rejection rate
• Scalability: 200 msec for 100 VMs/tenant or few
seconds for 1000 VMs/tenant
Mesos: Background Problem + Challenges and
Mesos Target Environment
Cluster Computing Framework Today
 Emerge, but no framework for all
 Multiplexing improve utilization and allow
sharing, but costly for replications
 Static partition / VM allocation per framework
not achieve high utilization or efficient sharing
>> no fine-grained sharing across framework
Key Challenges
• Complexity : scheduler API to get all
frameworks requirements and online
optimization for millions of tasks
• New framework and new scheduling
policies : current framework still
developed
• Expensive Refactoring : move many
individual frameworks scheduling into
global scheduling
Target Environment:
Cluster run Hadoop Jobs/Tasks as well as
MPI jobs in the same time
(Facebook or Yahoo dataware house)
Mesos: Proposed Solutions
Key Features
1) Resource Allocations
• Two allocation modules : max-min fairness for
multiple resource and strict priorities (similar with
Hadoop & Dryad)
• Task revocation mechanism: killing low impact tasks
& trigger when revocation
2) Isolations resources between framework executors
• Leveraging several existing OS isolation (modules)
• Currently using Linux Container and Solaris Project
3) Scalable and Robust Resource Offer with 3 mechanism
• Some framework always reject certain resources
• Response timer for framework to receive offer
• One framework no response, re-offer to other
framework
• Master Process manage mesos slaves
daemon on each cluster
• Framework run on each cluster to run
the tasks on each slave
• Framework has two component:
scheduler (register to master to get
resources) and executor (run the task)
Mesos API
Function for
Scheduler &
Executors
Mesos: Simulation and Evaluation Result
Evaluation
1) Macrobenchmark workloads (facebook hadoop
mix, large hadoop mix, Spark, Torque/MPI)
2) Overhead
3) Data Locality through Delay Scheduling
4) Iterative jobs using Spark
5) Mesos Scalability
6) Failure Recovery
7) Performance Isolation
Implementation
• 10,000 lines codes of C++
• Run on Linux, Solaris, and OS X
• Supporting frameworks on Java, C++,
and Python
• Zookeeper to leader election
• Linux container for CPU and Memory
• Tested frameworks: Hadoop, Torque,
MPICH2, and Spark
Resource
Utilization
Mesos
Scalability
macrobenchmark
Speedup
Result
Omega: Background Problem + Requirement and
Solutions Approach
Cluster Scheduler Problem
 Many different (high resource, rapid decision,
business constraint, etc.) goals but should
robust and always available
 Cluster and workloads are keep growing fast
 Monolithic and Two-level scheduling not
satisfied (difficult for new policy and difficult
to schedule)
 Complexity in hardware and workload
heterogeneity
Design Issues Cluster Scheduler
• Partitioning the scheduling work
• Choice of Resources from Cluster
• Interference (optimistic & pessimistic)
• Allocation Granularity (policy flexible)
• Cluster-wide behavior
Omega: Proposed Solutions
Key Features
1) Grant full access all scheduler to entire cluster (allow
compete in a free-for-all manner)
2) Optimistic concurrency control to mediate clashes to
update the cluster state
3) No central resource allocator (all decisions in scheduler)
4) Resource allocation copy in scheduler (called as “cell”)
5) Synchronize cell state (transaction), if failed try it again
6) Run in parallel and no wait for other jobs (no inter-
scheduler blocking)
7) Different policies for all scheduler and apply relative
important jobs (called as “precedence”)
• Monolithic: use in HPC with single
instance , same algorithm for all jobs
• Two-level: use by Mesos and Hadoop-
on-Demand, many different scheduler
control by central scheduler
• Shared State: use by Omega, avoiding
two level and limited parallelism
New Parallel Scheduler
around “shared-state”
Lock-free Optimistic
Concurreny Control Omega
Omega: Simulation and Evaluation Result
Evaluation for Trace-Driven Simulation
1) Scheduling Performance : how service scheduler busyness varies as jobs and tasks
2) Scaling the Workload: time for scaling the task if there any conflicts
3) Load-balancing the batch scheduler: more decision time for large batch jobs
4) Dealing with Conflicts with two choices : coarse-grained conflict detection and all-nothing schedule
5) MapScheduler impact for the utilization and time completion jbs
Lightweight
Simulator
Result
Simulator
1)Lightweight Simulator: for compare scheduler architecture in same conditions and identic workloads
2)A high-fidelity Simulator: for historical Google workload traces

More Related Content

PPTX
Task Scheduling Using Firefly algorithm with cloudsim
PDF
An efficient scheduling policy for load balancing model for computational gri...
PPT
Load Balancing In Cloud Computing newppt
PDF
Fault tolerant mechanisms in Big Data
PDF
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
PPT
Designing Distributed Systems: Google Cas Study
PPTX
Simulation of Heterogeneous Cloud Infrastructures
DOCX
Load rebalancing
Task Scheduling Using Firefly algorithm with cloudsim
An efficient scheduling policy for load balancing model for computational gri...
Load Balancing In Cloud Computing newppt
Fault tolerant mechanisms in Big Data
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Designing Distributed Systems: Google Cas Study
Simulation of Heterogeneous Cloud Infrastructures
Load rebalancing

What's hot (20)

PDF
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
PDF
Simulating Heterogeneous Resources in CloudLightning
PPTX
Cluster
PPTX
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
PDF
Cluster schedulerの紹介
PDF
Cluster Computing
PPTX
High Performance Computer
DOCX
distributed, concurrent, and independent access to encrypted cloud databases
PPTX
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011
PDF
4838281 operating-system-scheduling-on-multicore-architectures
PDF
Application scheduling in cloud sim
PPTX
CloudLightning Simulator
PDF
Hadoop data management
PPTX
cluster computing
PPTX
Task programming
PPTX
Cluster computing
PDF
Hadoop map reduce v2
PPT
Multiprocessor scheduling 1
DOCX
A load balancing model based on cloud partitioning for the public cloud
DOCX
Dynamic resource allocation using virtual machines for cloud computing enviro...
Getting up to Speed with MirrorMaker 2 (Mickael Maison, IBM & Ryanne Dolan) K...
Simulating Heterogeneous Resources in CloudLightning
Cluster
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
Cluster schedulerの紹介
Cluster Computing
High Performance Computer
distributed, concurrent, and independent access to encrypted cloud databases
WPF/ XamDataGrid Performance, Infragistics Seminar, Israel , November 2011
4838281 operating-system-scheduling-on-multicore-architectures
Application scheduling in cloud sim
CloudLightning Simulator
Hadoop data management
cluster computing
Task programming
Cluster computing
Hadoop map reduce v2
Multiprocessor scheduling 1
A load balancing model based on cloud partitioning for the public cloud
Dynamic resource allocation using virtual machines for cloud computing enviro...
Ad

Viewers also liked (17)

PPTX
THE H.264/MPEG4 AND ITS APPLICATIONS
DOCX
Example summary of SDN + NFV + Cloud Technology
PDF
Playing with OF@TEIN SDN-enabled Virtual Playgrounds
PPSX
PES Solar presentation
PPTX
CFI 2015 - Flow-centric Visibility Tools for OF@TEIN
PPTX
SDN@MYREN Day 2015 - OF@TEIN SDN-Cloud Playground
PDF
Visibility Challenge on OF@TEIN SDN-enabled Virtual Playgrounds
PDF
Master Thesis Presentation in Bahasa Indonesia
PPTX
CloudComp 2015 - SDN-Cloud Testbed with Hyper-convergent SmartX Boxes
PPTX
APAN-NRW 2015 - Community Effort towards Open/Shared Playground
PPTX
ICCE 2014 - Running Lifecycle Experiments over SDN-enabled OF@TEIN Testbed
PPTX
Noise pollution
PDF
Effects of air pollution m3
PPTX
Environmental impact assessment m5
PDF
Canal regulation works. m4pptx
PPTX
Air pollution control m4
THE H.264/MPEG4 AND ITS APPLICATIONS
Example summary of SDN + NFV + Cloud Technology
Playing with OF@TEIN SDN-enabled Virtual Playgrounds
PES Solar presentation
CFI 2015 - Flow-centric Visibility Tools for OF@TEIN
SDN@MYREN Day 2015 - OF@TEIN SDN-Cloud Playground
Visibility Challenge on OF@TEIN SDN-enabled Virtual Playgrounds
Master Thesis Presentation in Bahasa Indonesia
CloudComp 2015 - SDN-Cloud Testbed with Hyper-convergent SmartX Boxes
APAN-NRW 2015 - Community Effort towards Open/Shared Playground
ICCE 2014 - Running Lifecycle Experiments over SDN-enabled OF@TEIN Testbed
Noise pollution
Effects of air pollution m3
Environmental impact assessment m5
Canal regulation works. m4pptx
Air pollution control m4
Ad

Similar to Comparison between Cloud Mirror, Mesos Cluster, and Google Omega (20)

PDF
PDF
Introduction To Apache Mesos
PPTX
Cloud Computing
PPTX
Cloud Computing
PDF
The Evolution of Big Data Frameworks
PDF
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
PDF
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
PDF
Distributed Resource Scheduling Frameworks
PDF
Mesos: The Operating System for your Datacenter
PDF
Reference - Benjamin Hindman (Mesos Research Paper)
PDF
Apache Mesos
PDF
Apache Mesos Overview and Integration
PDF
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
PDF
Tachyon memory centric, fault tolerance storage for cluster framworks
PDF
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
PDF
Podila mesos con europe keynote aug sep 2016
PDF
The Rise of Cloud Computing Systems
PDF
Introduction to Apache Mesos
PDF
Datacenter Computing with Apache Mesos - BigData DC
PDF
Hadoop on-mesos
Introduction To Apache Mesos
Cloud Computing
Cloud Computing
The Evolution of Big Data Frameworks
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks
Mesos: The Operating System for your Datacenter
Reference - Benjamin Hindman (Mesos Research Paper)
Apache Mesos
Apache Mesos Overview and Integration
Mesos - A Platform for Fine-Grained Resource Sharing in the Data Center
Tachyon memory centric, fault tolerance storage for cluster framworks
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Podila mesos con europe keynote aug sep 2016
The Rise of Cloud Computing Systems
Introduction to Apache Mesos
Datacenter Computing with Apache Mesos - BigData DC
Hadoop on-mesos

Recently uploaded (20)

PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Comparative analysis of machine learning models for fake news detection in so...
DOCX
search engine optimization ppt fir known well about this
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
Configure Apache Mutual Authentication
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Consumable AI The What, Why & How for Small Teams.pdf
Microsoft User Copilot Training Slide Deck
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Data Virtualization in Action: Scaling APIs and Apps with FME
Comparative analysis of machine learning models for fake news detection in so...
search engine optimization ppt fir known well about this
The influence of sentiment analysis in enhancing early warning system model f...
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Taming the Chaos: How to Turn Unstructured Data into Decisions
Flame analysis and combustion estimation using large language and vision assi...
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Auditboard EB SOX Playbook 2023 edition.
Configure Apache Mutual Authentication
MuleSoft-Compete-Deck for midddleware integrations
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf

Comparison between Cloud Mirror, Mesos Cluster, and Google Omega

  • 1. Service Oriented Computing Reading Assignment #2 Cloud Mirror Mesos Cluster Google Omega Aris Cahyadi Risdianto - 20132095 -
  • 2. CloudMirror: Background Problem and Challenges Cloud hosted Application Problem  Not simple as Hadoop or Pregel  Interactive = predictable throughput & latency  100 msec latency increase = 1 % sales loss (Amazon)  Interactive workload ≥ batch workload CPU  Oversubscribe bandwidth to guarantee application = very expensive cost  No bandwidth-to-vCPU ratio to guarantee the bandwidth usage Key Challenges • “Easy” network abstraction model specify bandwidth requirement • A workload placement algorithm for efficient resource allocation • Scalable runtime to enforce bandwidth guarantee and efficient usage
  • 3. CloudMirror: Proposed Solutions *) New network abstraction based on application communication structure TAG* (Tenant Application Graph) Workload Placement Algorithm Cloud Mirror TAG Deployment • Bandwidth allocation at DC uplink match with TAG model requirements • Bandwidth saving by VM collocations in the subtree • VM Placement Algorithm to bridge the gap between high level TAG and low level infrastructure • Guaranteeing anti-affinity for HA and opportunistic anti- affinity for non-HA TAG model • each vertex graph represent application component/tier • Intuitive, descriptive, efficient and flexible • produced by OpenStack Heat and AWS Cloud formation extension
  • 4. CloudMirror: Simulation and Evaluation Result Evaluation 1) Efficiency a) Reserving Less Network Bandwidth b) Accepting more tenant request 2) Placement ability to guarantee and improve availability 3) Feasibility of deploying in real testbed Result Highlight • Benefits resource balancing as introduced in bandwidth capacity constraint network topology • Tenant rejection rate is less than 2.2 % and usually because of large VM/bandwidth requirements • Guaranteeing High Availability with higher WCS requirement will increase rejection rate • Scalability: 200 msec for 100 VMs/tenant or few seconds for 1000 VMs/tenant
  • 5. Mesos: Background Problem + Challenges and Mesos Target Environment Cluster Computing Framework Today  Emerge, but no framework for all  Multiplexing improve utilization and allow sharing, but costly for replications  Static partition / VM allocation per framework not achieve high utilization or efficient sharing >> no fine-grained sharing across framework Key Challenges • Complexity : scheduler API to get all frameworks requirements and online optimization for millions of tasks • New framework and new scheduling policies : current framework still developed • Expensive Refactoring : move many individual frameworks scheduling into global scheduling Target Environment: Cluster run Hadoop Jobs/Tasks as well as MPI jobs in the same time (Facebook or Yahoo dataware house)
  • 6. Mesos: Proposed Solutions Key Features 1) Resource Allocations • Two allocation modules : max-min fairness for multiple resource and strict priorities (similar with Hadoop & Dryad) • Task revocation mechanism: killing low impact tasks & trigger when revocation 2) Isolations resources between framework executors • Leveraging several existing OS isolation (modules) • Currently using Linux Container and Solaris Project 3) Scalable and Robust Resource Offer with 3 mechanism • Some framework always reject certain resources • Response timer for framework to receive offer • One framework no response, re-offer to other framework • Master Process manage mesos slaves daemon on each cluster • Framework run on each cluster to run the tasks on each slave • Framework has two component: scheduler (register to master to get resources) and executor (run the task) Mesos API Function for Scheduler & Executors
  • 7. Mesos: Simulation and Evaluation Result Evaluation 1) Macrobenchmark workloads (facebook hadoop mix, large hadoop mix, Spark, Torque/MPI) 2) Overhead 3) Data Locality through Delay Scheduling 4) Iterative jobs using Spark 5) Mesos Scalability 6) Failure Recovery 7) Performance Isolation Implementation • 10,000 lines codes of C++ • Run on Linux, Solaris, and OS X • Supporting frameworks on Java, C++, and Python • Zookeeper to leader election • Linux container for CPU and Memory • Tested frameworks: Hadoop, Torque, MPICH2, and Spark Resource Utilization Mesos Scalability macrobenchmark Speedup Result
  • 8. Omega: Background Problem + Requirement and Solutions Approach Cluster Scheduler Problem  Many different (high resource, rapid decision, business constraint, etc.) goals but should robust and always available  Cluster and workloads are keep growing fast  Monolithic and Two-level scheduling not satisfied (difficult for new policy and difficult to schedule)  Complexity in hardware and workload heterogeneity Design Issues Cluster Scheduler • Partitioning the scheduling work • Choice of Resources from Cluster • Interference (optimistic & pessimistic) • Allocation Granularity (policy flexible) • Cluster-wide behavior
  • 9. Omega: Proposed Solutions Key Features 1) Grant full access all scheduler to entire cluster (allow compete in a free-for-all manner) 2) Optimistic concurrency control to mediate clashes to update the cluster state 3) No central resource allocator (all decisions in scheduler) 4) Resource allocation copy in scheduler (called as “cell”) 5) Synchronize cell state (transaction), if failed try it again 6) Run in parallel and no wait for other jobs (no inter- scheduler blocking) 7) Different policies for all scheduler and apply relative important jobs (called as “precedence”) • Monolithic: use in HPC with single instance , same algorithm for all jobs • Two-level: use by Mesos and Hadoop- on-Demand, many different scheduler control by central scheduler • Shared State: use by Omega, avoiding two level and limited parallelism New Parallel Scheduler around “shared-state” Lock-free Optimistic Concurreny Control Omega
  • 10. Omega: Simulation and Evaluation Result Evaluation for Trace-Driven Simulation 1) Scheduling Performance : how service scheduler busyness varies as jobs and tasks 2) Scaling the Workload: time for scaling the task if there any conflicts 3) Load-balancing the batch scheduler: more decision time for large batch jobs 4) Dealing with Conflicts with two choices : coarse-grained conflict detection and all-nothing schedule 5) MapScheduler impact for the utilization and time completion jbs Lightweight Simulator Result Simulator 1)Lightweight Simulator: for compare scheduler architecture in same conditions and identic workloads 2)A high-fidelity Simulator: for historical Google workload traces