Building a highly scalable and available cloud application

SQLCAT - Shared technical
learnings

Agenda
• Fit for Purpose – what makes a good cloud app?
• Shifting Perspective – Designing for Cloud
• Lessons Learned
• Summary / Q&A

Setting the Stage
• Customer Advisory Team (CAT) works on big mean projects.
• Including a lot of big mean Azure projects
• Collating guidance and learnings from the last year of
engagement
• This discussion is a peek at some of what we’ve learned about
Azure applications at serious scale
• Take a deep breath.. Not all Azure applications are this involved 

Fit for Purpose – What makes a good cloud app?

Fit for Purpose – What makes a good cloud app?

DISPERSED
USERS & DATA ELASTIC DEMAND SCALE OUT

Shifting Perspective – Designing for Cloud
• Scale-out not scale-up
• Everything has a limit – compose for scale
• Design for failure
• Design for continuity
• Optimize for density

Scale-out not scale-up
• Traditional 3-tier application
• Make ”everything stateless”

Load Balancer
• Where is the state?

Web App
Servers Servers

• Traditional 3-tier application
• Make ”everything stateless”

Load Balancer
Database
• Where is the state?
• Oh, right.. in the scale-up
database
Web App
Servers Servers

Azure Load Balancer

• Challenge: architect
applications to use partitioned
data store
• Connection management
• Data partitioning & affinity
• Scatter / gather queries
• Resource management DB1 DB2 DB3

Everything has a limit – compose for scale
• Ship as much as you want
• Provided it will fit into the
standard “scale units”
• Want to ship more – use more
containers.

Design for failure
• Traditional approach: harden
the database
• Cloud approach: expect
failures, design for them, work
around them

Optimize for Density
• Density is cost of goods
• Chunky not chatty
• Framework and library
efficiency

Handling Transient and Enduring Failures
• Given enough scale, time and pressure all components or
services will fail
• Your application will experience 1..N failures
• Transient failures; temporary service interruptions
• Dropped connections, failed queries
• Enduring failures; require intervention
• Incorrect configuration, long-running service unavailability

• Use fault-handling
frameworks that recognize
transient errors
• Appropriate retry and
backoff policies

Web Request Response Latency
450
Seconds

400
350
300
250
200
150
100
50
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Avg Latency Response latency

Scale it out, stitch it together
• Partitioning strategies:
• Horizontal
• Vertical
• Hybrid
• CSV vs. Global data model

Telemetry is Life
• You don’t know what you didn’t capture
• Split the streams: high-volume (low-
fidelity) and high-value
• Know you’re ”down” before your users
are!
• Be able to figure out why afterwards

Handling
transient
failures
Logging transient
failures

Logging all external
API calls with timing

Logging full exception
(not .ToString())

Telemetry is Life
Per-Application High value data High value data consumer
- Filter - Generate alerts
Server - Aggregate - Display dashboard
- Publish - Operational intelligence
Data Sources
- IIS logs
- Application logs
- Performance counters
High volume data High volume data consumer
- Batch - Data mining / analysis
- Partition - Historical trends
- Archive - Root Cause Analysis

Azure Load Balancer

Managing Connections
• Instances * DB’s * Pool Size
• Each hosted service has 1 IP
• Each DB cluster has 1 IP
• How big is a routing table
entry for IPv4?
DB1 DB2 DB3

Hint: 64k

Optimize work: batch & align
• Challenge:
• Optimize insert of activity and user data into a scale-out data tier (400+
databases)
• Transient failure – retries
• Enduring failure – failover to alternate store
• Optimize for partition alignment

Impact of Interface
• Be careful about paying for features you don’t use
• Look at optimized frameworks / libraries for key aspects
• Balance features vs. Performance – CoGS can add up quickly

Mark Simms
Summary / Q&A masimms@microsoft.c
om
• Architecture is key Twitter: @mabsimms
• Failure is the norm; expect it, design for it
• Scale through partitioning and composition
• Scale exposes the seams of your implementation
• CAT preparing to publish hands-on guidance with reusable
patterns

Building a highly scalable and available cloud application

More Related Content

What's hot (15)

Viewers also liked (20)

Similar to Building a highly scalable and available cloud application (20)

Recently uploaded (20)

Building a highly scalable and available cloud application