Resource Replication & Automated Scaling Listener

Resource Replication &
Automated Scaling Listener
Dr Hitesh Mohapatra
School of Computer Engineering
Associate Professor
KIIT (Deemed to be) University

Contents
Resource Replication:
• Concept of replication in cloud
• Types of replications (synchronous vs asynchronous)
• Benefits: High availability, disaster recovery
• Example: Replication in distributed systems
Automated Scaling Listener:
• What is automated scaling and why it's important?
• How listeners help detect when scaling is required
• Auto-scaling groups in AWS, GCP auto-scaler
• Configuration of scaling policies
2/7/2025 Dr Hitesh Mohapatra 2

Resource Replication {Follow the hyperlink}
Resource Replication
• Concept of replication in cloud
• Types of replications (synchronous vs asynchronous)
• Benefits: High availability, disaster recovery
• Example: Replication in distributed systems

Objective
• What is automated scaling and why it's important?
• How listeners help detect when scaling is required
• Auto-scaling groups in AWS, GCP auto-scaler
• Configuration of scaling policies

•Service Agent Definition:
•Known as the automated scaling listener mechanism.
•Function:
•Tracks and monitors communications between cloud service users and cloud services.
•Purpose:
•Supports dynamic scaling.
•Installation:
•Installed in the cloud, typically close to the firewall.
•Monitoring:
•Continuously tracks data on the status of the workload.
•Assessment Criteria:
•Based on the number of requests made by cloud users.
•Also based on the demands placed on the backend by particular types of requests.
•Example:
•Processing a small amount of incoming data can take a significant amount of time.

Working

Why it is required ?
• Automatically Adjusting IT Resources based on previously set
parameters by the cloud consumer (Auto Scaling).
• Automatic Notification of the cloud consumer when
workloads go above or below predetermined thresholds. This
gives the cloud user the option to change how its present IT
resources are allocated. (Auto Notification)

Architecture

Steps
1. Service Agent Roles:
•The service agents that perform the role of autonomous scaling listeners go by many
names depending on the cloud provider vendor.
2. Initial Access Attempt:
•Three users of cloud services simultaneously try to access one cloud service.
3. Creation of Instances:
•The autonomous scaling listener creates three duplicate instances of the service when it
grows out.
4. Additional Access Attempt:
•A fourth user of a cloud service tries to access the service.
5. Limit Exceeded Alert:
•The automated scaling listener rejects the fourth attempt and alerts the cloud consumer
that the intended workload limit has been exceeded because the cloud service is only
designed to enable up to three instances.
6. Administrator Action:
•To modify the provisioning configuration and raise the redundant instance limit, the cloud
resource administrator of the cloud consumer logs into the remote administration
environment.

How listeners help detect when scaling is
required?
1. Monitoring Workloads:
• Continuously track data on the status of workloads.
• Monitor the number of incoming requests and the load on the backend services.
2. Analyzing Data:
• Assess the patterns and trends in the data to identify peaks and troughs in usage.
• Evaluate the types of requests and their impact on resource consumption.
3. Thresholds and Alerts:
• Set predefined thresholds for various metrics like CPU usage, memory usage, and
network traffic.
• Trigger alerts when these thresholds are crossed, indicating a need for scaling.

Cont.
4. Resource Allocation:
• Determine the appropriate number of resources needed to handle the current
and projected workload.
• Automatically allocate or deallocate resources based on real-time demand.
5. Preventing Overload:
• Prevent overloading of services by ensuring that additional instances are created
when demand spikes.
• Reject new requests or scale down resources when the demand decreases,
maintaining optimal performance.
6. Feedback Loops:
• Implement feedback loops to continuously improve the scaling process.
• Adjust the thresholds and resource allocation strategies based on past
performance and usage patterns.

Cont.
By effectively monitoring and analyzing the workload data, automated
scaling listeners help maintain the efficiency, reliability, and
performance of cloud services. They ensure that resources are
dynamically scaled to meet the demands of the users, preventing both
underutilization and overload.

Difference between Auto Scaling vs Load
Balancing
Feature Auto Scaling Load Balancing
Primary Function
Automatically adjusts the number of
instances based on demand
Distributes incoming traffic across multiple
instances
Purpose
To ensure optimal resource utilization
and handle varying loads
To ensure high availability and reliability by
balancing load
Operation Adds or removes instances as needed
Distributes traffic based on predefined rules
or algorithms
Focus Resource scaling Traffic distribution
Usage Scenario
Scaling in and out instances based on
application needs
Balancing traffic load across running
instances
Reduction of Backend
Duties
Manages instance scaling, reducing
manual intervention
Balances load, manages traffic, and monitors
server health
Combination
Often used together for optimal
performance and scalability
Often used together with auto-scaling for
efficient traffic management
Example Tools AWS Auto Scaling, Azure Autoscale
Elastic Load Balancing (ELB), Azure Load
Balancer

Auto-scaling groups in AWS
Definition:
•Auto Scaling Groups (ASGs) are a
collection of Amazon EC2 instances
managed as a logical grouping for
automatic scaling and management.
Components:
•Launch Configuration: Defines the
instance type, AMI ID, key pair, security
groups, and other configurations.
•Scaling Policies: Determine how and
when the ASG should scale in or out
based on predefined criteria.

Cont.
Dynamic and Predictive Scaling:
•Dynamic Scaling: Adjusts the number of instances based on real-time demand (e.g.,
CPU utilization).
•Predictive Scaling: Uses historical data to predict and provision resources ahead of
time.
Health Checks:
•Continuously monitors the health of instances within the group.
•Automatically replaces unhealthy instances to ensure high availability.
Benefits:
•Automatically adjusts capacity to maintain steady, predictable performance at the
lowest possible cost.
•Ensures application availability by automatically adding or removing instances as
needed.

GCP auto-scaler
Definition:
The GCP Auto Scaler automatically
adjusts the number of VM instances in
a managed instance group based on
the current load.
Components:
•Instance Group Manager: Manages
the lifecycle of VM instances within the
instance group.
•Scaling Policies: Define the metrics
and thresholds for scaling actions.

Cont.
Scaling Metrics:
•Uses various metrics such as CPU utilization, HTTP load balancing serving
capacity, and stackdriver custom metrics to decide when to scale.
Auto-Healing:
•Automatically recreates failed instances to maintain the desired state of the
instance group.
Predictive Autoscaler:
•Uses machine learning to analyze historical usage data and predict future
demand.
•Provision resources proactively to meet anticipated demand.
Benefits:
•Provides cost savings by automatically adjusting resources based on demand.
•Enhances application performance by ensuring sufficient capacity to handle the
load.

Example: Autoscaling based on CPU
Utilization

For example, assume
the load balancing
serving capacity of a
managed instance
group is defined as 100
RPS per instance. If
you create an
autoscaler with the
HTTP(S) load
balancing policy and
set it to maintain a
target utilization level
of 0.8 or 80%, the
autoscaler will add or
remove instances from
the managed instance
group to maintain 80%
of the serving capacity,
or 80 RPS per instance.

Difference between Horizontal vs Vertical
Auto Scaling
Aspect Horizontal Auto Scaling Vertical Auto Scaling
Definition
Addition of more servers or computers to the auto-
scaling group
Scaling by supplying more power (e.g., more RAM) to an
existing system
Scalability Expands the resource pool with more machines Boosts the power of an already-running system
Handling High
User Load
Can handle queries from thousands of users Limited in handling very high user loads
Components Clustering, distributed file systems, load balancing Increasing resources like CPU and RAM
Stateless
Servers
Crucial for handling large number of users; sessions can
move across servers
Not applicable
Downtime
No downtime required; creates new instances
separately
Requires downtime for upgrades and reconfigurations
Availability
Improves availability and performance due to
independence
No improvement in availability; dependent on a single
machine
Performance
Enhances user experience with browser-side session
storage
Improves performance
Redundancy Supports redundancy with multiple instances No redundant server; dependent on single location
Elastic Load
Balancing
Scales incoming requests across instances Not applicable; deals with vertical resource allocation
Best Use Cases Ideal for applications with a large user base Suitable for applications with fewer scalability demands
Challenges Requires effective clustering and load balancing Architectural issues due to single machine dependency
Overall Impact Enhances scalability, availability, and user experience Boosts performance but limited in scalability and availability

Configuration of scaling policies
Step 1: Define Your Scaling Goals
• Identify Key Metrics: Determine which metrics (e.g., CPU utilization, memory
usage, request count) will trigger scaling actions.
• Set Desired Performance Levels: Establish the target performance levels for
your application, such as response time and availability.
Step 2: Create Scaling Policies
• Threshold-Based Policies: Define specific thresholds for your key metrics. For
example, if CPU utilization exceeds 70%, trigger a scale-out action.
• Scheduled Policies: Schedule scaling actions based on predictable usage
patterns. For example, scale out during peak hours and scale in during off-
peak hours.
• Predictive Policies: Use machine learning models to predict future demand
and proactively scale resources accordingly.

Cont.
Step 3: Configure Alarms and Triggers
• Set Alarms: Create alarms that monitor the key metrics and trigger scaling
actions when thresholds are reached.
• Define Triggers: Specify the conditions under which scaling actions should be
triggered, such as exceeding or falling below the defined thresholds.
Step 4: Define Scaling Actions
• Scale-Out Actions: Specify how many instances to add when scaling out. For
example, add two instances if CPU utilization exceeds 70%.
• Scale-In Actions: Specify how many instances to remove when scaling in. For
example, remove one instance if CPU utilization falls below 30%.

Cont.
Step 5: Configure Cooldown Periods
• Set Cooldown Periods: Define cooldown periods to prevent rapid, repetitive
scaling actions. This allows the system to stabilize before triggering another
scaling action.
Step 6: Implement and Test Policies
• Deploy Policies: Implement the scaling policies in your cloud service
configuration.
• Test Policies: Test the policies under different load conditions to ensure they
work as expected and make adjustments as needed.
Step 7: Monitor and Optimize
• Continuous Monitoring: Continuously monitor the performance and
effectiveness of your scaling policies.
• Optimize Policies: Regularly review and optimize the scaling policies based on
performance data and changing application requirements.

Questions
1. What is the primary function of an automated scaling listener mechanism?
2. How do automated scaling listeners support dynamic scaling in the cloud?
3. Where are automated scaling listeners typically installed in the cloud?
4. What data do automated scaling listeners continuously track to assess workloads?
5. How does an autonomous scaling listener respond when multiple users try to access the same cloud
service?
6. What action does an automated scaling listener take when the intended workload limit is exceeded?
7. What is the main difference between horizontal auto-scaling and vertical auto-scaling?
8. Why is stateless server architecture crucial for horizontal auto-scaling?
9. How does load balancing improve the availability and performance of cloud services?
10.Can you explain the connection between load balancing and application auto-scaling?
11.What are the key steps involved in configuring scaling policies for cloud services?
12.How do threshold-based scaling policies differ from scheduled scaling policies?
13.What are the primary components of AWS Auto Scaling Groups?
14.How does the GCP Auto Scaler use metrics to decide when to scale?
15.Why is it important to have effective scaling policies in place for cloud applications?
16.How can feedback loops improve the scaling process in cloud services?

Resource Replication & Automated Scaling Listener

More Related Content

What's hot (20)

Similar to Resource Replication & Automated Scaling Listener (20)

More from Hitesh Mohapatra (17)

Recently uploaded (20)

Resource Replication & Automated Scaling Listener