SlideShare a Scribd company logo
SQLCAT - Shared technical
learnings
Agenda
•   Fit for Purpose – what makes a good cloud app?
•   Shifting Perspective – Designing for Cloud
•   Lessons Learned
•   Summary / Q&A
Setting the Stage
• Customer Advisory Team (CAT) works on big mean projects.
 • Including a lot of big mean Azure projects
• Collating guidance and learnings from the last year of
  engagement
• This discussion is a peek at some of what we’ve learned about
  Azure applications at serious scale
 • Take a deep breath.. Not all Azure applications are this involved 
Building a highly scalable and available cloud application
Fit for Purpose – What makes a good cloud app?
Fit for Purpose – What makes a good cloud app?



 DISPERSED
USERS & DATA    ELASTIC DEMAND      SCALE OUT
Shifting Perspective – Designing for Cloud
•   Scale-out not scale-up
•   Everything has a limit – compose for scale
•   Design for failure
•   Design for continuity
•   Optimize for density
Scale-out not scale-up
• Traditional 3-tier application
• Make ”everything stateless”




                                   Load Balancer
• Where is the state?



                                                    Web       App
                                                   Servers   Servers
Scale-out not scale-up
•   Traditional 3-tier application
•   Make ”everything stateless”




                                     Load Balancer
                                                                         Database
•   Where is the state?
•   Oh, right.. in the scale-up
    database
                                                      Web       App
                                                     Servers   Servers
Azure Load Balancer

Scale-out not scale-up
• Challenge: architect
 applications to use partitioned
 data store
 •   Connection management
 •   Data partitioning & affinity
 •   Scatter / gather queries
 •   Resource management            DB1          DB2            DB3
Everything has a limit – compose for scale
• Ship as much as you want
• Provided it will fit into the
  standard “scale units”
• Want to ship more – use more
  containers.
Design for failure
• Traditional approach: harden
  the database
• Cloud approach: expect
  failures, design for them, work
  around them
Optimize for Density
• Density is cost of goods
• Chunky not chatty
• Framework and library
 efficiency
Handling Transient and Enduring Failures
• Given enough scale, time and pressure all components or
  services will fail
  • Your application will experience 1..N failures
• Transient failures; temporary service interruptions
 • Dropped connections, failed queries
• Enduring failures; require intervention
 • Incorrect configuration, long-running service unavailability
Handling Transient and Enduring Failures
• Use fault-handling
  frameworks that recognize
  transient errors
• Appropriate retry and
  backoff policies
Building a highly scalable and available cloud application
Handling Transient and Enduring Failures
                           Web Request Response Latency
          450
Seconds




          400
          350
          300
          250
          200
          150
          100
           50
            0
                1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

                                   Avg Latency       Response latency
Building a highly scalable and available cloud application
Scale it out, stitch it together
• Partitioning strategies:
  • Horizontal
  • Vertical
  • Hybrid
• CSV vs. Global data model
Telemetry is Life
• You don’t know what you didn’t capture
• Split the streams: high-volume (low-
  fidelity) and high-value
• Know you’re ”down” before your users
  are!
  • Be able to figure out why afterwards
Handling
transient
 failures
      Logging transient
          failures

            Logging all external
            API calls with timing

 Logging full exception
    (not .ToString())
Telemetry is Life
       Per-Application           High value data    High value data consumer
                                 - Filter           - Generate alerts
       Server                    - Aggregate        - Display dashboard
                                 - Publish          - Operational intelligence
        Data Sources
        - IIS logs
        - Application logs
        - Performance counters
                                 High volume data   High volume data consumer
                                 - Batch            - Data mining / analysis
                                 - Partition        - Historical trends
                                 - Archive          - Root Cause Analysis
Azure Load Balancer

Managing Connections
•   Instances * DB’s * Pool Size
•   Each hosted service has 1 IP
•   Each DB cluster has 1 IP
•   How big is a routing table
    entry for IPv4?
                                      DB1          DB2            DB3

                          Hint: 64k
Optimize work: batch & align
• Challenge:
 • Optimize insert of activity and user data into a scale-out data tier (400+
   databases)
 • Transient failure – retries
 • Enduring failure – failover to alternate store
 • Optimize for partition alignment
Building a highly scalable and available cloud application
Impact of Interface
• Be careful about paying for features you don’t use
• Look at optimized frameworks / libraries for key aspects
 • Balance features vs. Performance – CoGS can add up quickly
Building a highly scalable and available cloud application
Mark Simms
Summary / Q&A                                masimms@microsoft.c
                                             om
•   Architecture is key                      Twitter: @mabsimms
•   Failure is the norm; expect it, design for it
•   Scale through partitioning and composition
•   Scale exposes the seams of your implementation
•   CAT preparing to publish hands-on guidance with reusable
    patterns

More Related Content

What's hot (15)

PPTX
Cloud Design Pattern part2
Masashi Narumoto
 
PPT
Presentation on Large Scale Data Management
Chris Bunch
 
PPTX
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
PPTX
Building a Just-in-Time Application Stack for Analysts
Avere Systems
 
PDF
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
PDF
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Ruslan Synytsky
 
PDF
High Availability and Disaster Recovery
Akelios
 
PDF
#Re-Imagine Autoscaling Stream Consumers in Cloud Environments (Sunil Kaitha,...
confluent
 
PPTX
How Applications Manager helps with application performance monitoring
ManageEngine, Zoho Corporation
 
PPTX
Make a Move to AWS Now
Buurst
 
PPTX
Cloud Design Pattern part1
Masashi Narumoto
 
PDF
Cmg06 utilization is useless
Adrian Cockcroft
 
PPTX
Cloud Migration: Tales from the Trenches
Hostway|HOSTING
 
PDF
Modernize databases in a day discover workshop
Johan Biere
 
PPTX
Data at Scale - Michael Peacock, Cloud Connect 2012
Michael Peacock
 
Cloud Design Pattern part2
Masashi Narumoto
 
Presentation on Large Scale Data Management
Chris Bunch
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Avere Systems
 
Building a Just-in-Time Application Stack for Analysts
Avere Systems
 
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Ruslan Synytsky
 
High Availability and Disaster Recovery
Akelios
 
#Re-Imagine Autoscaling Stream Consumers in Cloud Environments (Sunil Kaitha,...
confluent
 
How Applications Manager helps with application performance monitoring
ManageEngine, Zoho Corporation
 
Make a Move to AWS Now
Buurst
 
Cloud Design Pattern part1
Masashi Narumoto
 
Cmg06 utilization is useless
Adrian Cockcroft
 
Cloud Migration: Tales from the Trenches
Hostway|HOSTING
 
Modernize databases in a day discover workshop
Johan Biere
 
Data at Scale - Michael Peacock, Cloud Connect 2012
Michael Peacock
 

Viewers also liked (20)

PDF
Somos estrangeiros
Professor Paulo Francisco
 
DOCX
Bolo de laranja
anaclara_aasa
 
PPTX
Proyecto tecnologico
Dayana Andrea Chalapud Ruales
 
PPSX
Gestão do ambiente e dos resíduos - Ana Filipa, Ramona, Inês e Matilde
jmabp4
 
PPTX
Registro digital de experiência
jodarcsene
 
PDF
Paternidade
leandroaleixo
 
PDF
3.º bol. novos recursos bibl.
Margarida Botelho da Silva
 
PPTX
Jlug Introduction
collins omwenga
 
DOCX
Stereo love
criss1972
 
PDF
P13 006
Léo Ribeiro
 
XLSX
Ajedrez Servicios Publicos
Rodrigo C.
 
PPT
Ppoint
Julieta Gatica
 
PDF
Dwandala-Howard
Dwandala Howard
 
PPT
Приближенные значения чисел. Округление чисел
upiterra
 
DOCX
D. Inês de Castro e D. Pedro
claudiapinto7a
 
PDF
Design guide for textile ducting
KE Fibertec AS
 
PPTX
Base nacional comum curricular cepeja cópia
Sandra Maria Alves da Costa
 
PPTX
Sales & Marketing - The Difference
Dr. Vickram Aadityaa
 
Somos estrangeiros
Professor Paulo Francisco
 
Bolo de laranja
anaclara_aasa
 
Proyecto tecnologico
Dayana Andrea Chalapud Ruales
 
Gestão do ambiente e dos resíduos - Ana Filipa, Ramona, Inês e Matilde
jmabp4
 
Registro digital de experiência
jodarcsene
 
Paternidade
leandroaleixo
 
3.º bol. novos recursos bibl.
Margarida Botelho da Silva
 
Jlug Introduction
collins omwenga
 
Stereo love
criss1972
 
P13 006
Léo Ribeiro
 
Ajedrez Servicios Publicos
Rodrigo C.
 
Dwandala-Howard
Dwandala Howard
 
Приближенные значения чисел. Округление чисел
upiterra
 
D. Inês de Castro e D. Pedro
claudiapinto7a
 
Design guide for textile ducting
KE Fibertec AS
 
Base nacional comum curricular cepeja cópia
Sandra Maria Alves da Costa
 
Sales & Marketing - The Difference
Dr. Vickram Aadityaa
 
Ad

Similar to Building a highly scalable and available cloud application (20)

PPTX
Building azure applications ireland
Michael Meagher
 
PPTX
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Vikas Sahni
 
PDF
Best Practices for Building Scalable Web Applications.pdf
Isabella Barry
 
PPTX
Design Reviews for Operations - Velocity Europe 2014
Mandi Walls
 
PDF
Azure and cloud design patterns
Venkatesh Narayanan
 
PPTX
Cloud First Architecture
Cameron Vetter
 
PPTX
Building-Scalable-Web-Applications.Presentation
Ozias Rondon
 
PPTX
Application architecture for cloud
Marco Parenzan
 
PPT
7 Stages of Scaling Web Applications
David Mitzenmacher
 
PDF
10 things ever architect should know about the Windows Azure Platform - ericnel
Eric Nelson
 
PPT
Best Practices for Large-Scale Web Sites
Craig Dickson
 
PPT
Architecture Best Practices on Windows Azure
Nuno Godinho
 
PDF
Scaling apps using azure cloud services
Willy Marroquin (WillyDevNET)
 
PDF
Windows Azure For Architects
Anko Duizer
 
PPTX
Scalable Web Architecture and Distributed Systems
hyun soomyung
 
PDF
From the Trenches: Effectively Scaling Your Cloud Infrastructure and Optimizi...
Allan Mangune
 
PDF
Mtc learnings from isv & enterprise interaction
Govind Kanshi
 
PPTX
Mtc learnings from isv & enterprise (dated - Dec -2014)
Govind Kanshi
 
PPT
Best Practices for Large-Scale Websites -- Lessons from eBay
Randy Shoup
 
PPTX
Cloud computing
Aaron Tushabe
 
Building azure applications ireland
Michael Meagher
 
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Vikas Sahni
 
Best Practices for Building Scalable Web Applications.pdf
Isabella Barry
 
Design Reviews for Operations - Velocity Europe 2014
Mandi Walls
 
Azure and cloud design patterns
Venkatesh Narayanan
 
Cloud First Architecture
Cameron Vetter
 
Building-Scalable-Web-Applications.Presentation
Ozias Rondon
 
Application architecture for cloud
Marco Parenzan
 
7 Stages of Scaling Web Applications
David Mitzenmacher
 
10 things ever architect should know about the Windows Azure Platform - ericnel
Eric Nelson
 
Best Practices for Large-Scale Web Sites
Craig Dickson
 
Architecture Best Practices on Windows Azure
Nuno Godinho
 
Scaling apps using azure cloud services
Willy Marroquin (WillyDevNET)
 
Windows Azure For Architects
Anko Duizer
 
Scalable Web Architecture and Distributed Systems
hyun soomyung
 
From the Trenches: Effectively Scaling Your Cloud Infrastructure and Optimizi...
Allan Mangune
 
Mtc learnings from isv & enterprise interaction
Govind Kanshi
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Govind Kanshi
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Randy Shoup
 
Cloud computing
Aaron Tushabe
 
Ad

Recently uploaded (20)

PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Digital Circuits, important subject in CS
contactparinay1
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 

Building a highly scalable and available cloud application

  • 1. SQLCAT - Shared technical learnings
  • 2. Agenda • Fit for Purpose – what makes a good cloud app? • Shifting Perspective – Designing for Cloud • Lessons Learned • Summary / Q&A
  • 3. Setting the Stage • Customer Advisory Team (CAT) works on big mean projects. • Including a lot of big mean Azure projects • Collating guidance and learnings from the last year of engagement • This discussion is a peek at some of what we’ve learned about Azure applications at serious scale • Take a deep breath.. Not all Azure applications are this involved 
  • 5. Fit for Purpose – What makes a good cloud app?
  • 6. Fit for Purpose – What makes a good cloud app? DISPERSED USERS & DATA ELASTIC DEMAND SCALE OUT
  • 7. Shifting Perspective – Designing for Cloud • Scale-out not scale-up • Everything has a limit – compose for scale • Design for failure • Design for continuity • Optimize for density
  • 8. Scale-out not scale-up • Traditional 3-tier application • Make ”everything stateless” Load Balancer • Where is the state? Web App Servers Servers
  • 9. Scale-out not scale-up • Traditional 3-tier application • Make ”everything stateless” Load Balancer Database • Where is the state? • Oh, right.. in the scale-up database Web App Servers Servers
  • 10. Azure Load Balancer Scale-out not scale-up • Challenge: architect applications to use partitioned data store • Connection management • Data partitioning & affinity • Scatter / gather queries • Resource management DB1 DB2 DB3
  • 11. Everything has a limit – compose for scale • Ship as much as you want • Provided it will fit into the standard “scale units” • Want to ship more – use more containers.
  • 12. Design for failure • Traditional approach: harden the database • Cloud approach: expect failures, design for them, work around them
  • 13. Optimize for Density • Density is cost of goods • Chunky not chatty • Framework and library efficiency
  • 14. Handling Transient and Enduring Failures • Given enough scale, time and pressure all components or services will fail • Your application will experience 1..N failures • Transient failures; temporary service interruptions • Dropped connections, failed queries • Enduring failures; require intervention • Incorrect configuration, long-running service unavailability
  • 15. Handling Transient and Enduring Failures • Use fault-handling frameworks that recognize transient errors • Appropriate retry and backoff policies
  • 17. Handling Transient and Enduring Failures Web Request Response Latency 450 Seconds 400 350 300 250 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Avg Latency Response latency
  • 19. Scale it out, stitch it together • Partitioning strategies: • Horizontal • Vertical • Hybrid • CSV vs. Global data model
  • 20. Telemetry is Life • You don’t know what you didn’t capture • Split the streams: high-volume (low- fidelity) and high-value • Know you’re ”down” before your users are! • Be able to figure out why afterwards
  • 21. Handling transient failures Logging transient failures Logging all external API calls with timing Logging full exception (not .ToString())
  • 22. Telemetry is Life Per-Application High value data High value data consumer - Filter - Generate alerts Server - Aggregate - Display dashboard - Publish - Operational intelligence Data Sources - IIS logs - Application logs - Performance counters High volume data High volume data consumer - Batch - Data mining / analysis - Partition - Historical trends - Archive - Root Cause Analysis
  • 23. Azure Load Balancer Managing Connections • Instances * DB’s * Pool Size • Each hosted service has 1 IP • Each DB cluster has 1 IP • How big is a routing table entry for IPv4? DB1 DB2 DB3 Hint: 64k
  • 24. Optimize work: batch & align • Challenge: • Optimize insert of activity and user data into a scale-out data tier (400+ databases) • Transient failure – retries • Enduring failure – failover to alternate store • Optimize for partition alignment
  • 26. Impact of Interface • Be careful about paying for features you don’t use • Look at optimized frameworks / libraries for key aspects • Balance features vs. Performance – CoGS can add up quickly
  • 28. Mark Simms Summary / Q&A [email protected] om • Architecture is key Twitter: @mabsimms • Failure is the norm; expect it, design for it • Scale through partitioning and composition • Scale exposes the seams of your implementation • CAT preparing to publish hands-on guidance with reusable patterns