Scalable Service Architectures
Lessons learned
Zoltán Németh
Engineering Manager, Core Systems
Agenda
 Our scalability experience
 What is Scalability?
 Requirements in detail
 Tips and tools
 Extras, Closing remarks
Our experience
Streaming stack
Scalable Service Architectures
Defining scalability
Scalability is the ability to handle increased workload
by repeatedly applying a costeffective strategy for
extending a system’s capacity.
(CMU paper, 2006)
How well a solution to some problem will work when
the size of the problem increases. When the size
decreases, the solution must fit. (dictionary.com and
Theo Schlossnagle, 2006)
Self-contained
service
 Explicitly declare and
isolate dependencies
 Isolation from the outside
system
 Static linking
 Do not rely on system
packages
Disposability  Maximize robustness with
fast startup and graceful
shutdown
 Disposable processes
 Graceful shutdown on
SIGTERM
 Handling sudden death:
robust queue backend
Startup and
Shutdown
 Automate all the things
 Chef
 Docker
 Gold image based
deployment
 Immutable
 Handling tasks before
shutdown
Backing Services  Treat backing services as
attached resources
 No distinction between
local and third party
services
 Easily swap out resources
 Export services via port
binding
 Become the backing
service for another app
Processes,
concurrency
 Stateless processes (not
even sticky sessions)
 Process types by work type
 We <3 linux process
 Shared-nothing  adding
concurrency is safe
 Process distribution
spanning machines
Statelessness  Store everything in a
datastore
 Aggregate data
 Chandra
 Scalable datastores
 Redis
 Cassandra
 Aerospike
 Handling user sessions
Monitoring  Application state and
metrics
 Dashboards
 Alerting
 Health
 Remove failing nodes
 Capacity
 Act on trends
Monitoring  Metrics collecting
 Graphite, New Relic
 Self-aware checks
 Cluster state
 Zookeeper, Consul
 Scaling decision types
 Capacity amount
 Graph derivative
 App requests
Scalable Service Architectures
Load Balance and
Resource
Allocation
 Load Balance: distribute
tasks
 Utilize machines
efficiently
 VM compatible apps
 Flexibility
 Adapting to available
resources
Load Balance  DNS or API
 App level balance
 Uniform entry point or
proxy
 Balance decisions
 Load
 Zookeeper state
 Resource policies
Service
Separation
 Failure is inevitable
 Protect from failing
components
 Cascading failure
 Fail fast
 Decoupling
 Asynchronous operations
 Message queues
Service
Separation
 Rate limiting
 Circuit Breaker pattern
 Stop cascading failure,
allow recovery
 Hystrix
 Fail fast, fail silent
 Service decoupling
Extras  Debugging features
 Logs
 Clojure / JS consoles
 Runtime configuration
via env
 Scaling API
 Integrating several
cloud providers
 Automatic start / stop
Reading
 Scalable Internet Architectures by Theo Schlossnagle
 The 12-factor App: http://12factor.net/
 Carnegie Mellon Paper: http://www.sei.cmu.edu/reports/06tn012.pdf
 Circuit Breaker: http://martinfowler.com/bliki/CircuitBreaker.html
 Release It! by Michael T. Nygard
Questions
syntaxerror@ustream.tv

More Related Content

PPTX
Azure Reference Architectures
PPTX
Scalable service architectures @ BWS16
PPTX
Architectural Refactoring
PPTX
Reactive programming - Dirk Janssen (presentation 13th SPIN Meetup)
PDF
PowerShell DSC - State of the Art & Community by Gael Colas
PPTX
Cloud computing
PPTX
CloudBrew 2016 - Building IoT solution with Service Fabric
PDF
Service Manager and Orchestrator: Better Together!
Azure Reference Architectures
Scalable service architectures @ BWS16
Architectural Refactoring
Reactive programming - Dirk Janssen (presentation 13th SPIN Meetup)
PowerShell DSC - State of the Art & Community by Gael Colas
Cloud computing
CloudBrew 2016 - Building IoT solution with Service Fabric
Service Manager and Orchestrator: Better Together!

What's hot (10)

PDF
"What database can tell about application issues? What application can tell a...
PPSX
Scaling Application
PPTX
Building Cloud Apps using Azure SQL Database
PPT
Smolen Alex Securing The Mvc Architecture Part Two
PDF
Whitepaper : Building an Efficient Microservices Architecture
PPTX
Azure AWS real time-interview questions part 9
PPTX
Designing distributed systems
PDF
Elastic APM: Combinalo con tus logs y métricas para una visibilidad completa
PDF
Elastic APM: Amplía tus logs y métricas para ver el panorama completo
PPT
eSobi Site Initiation
"What database can tell about application issues? What application can tell a...
Scaling Application
Building Cloud Apps using Azure SQL Database
Smolen Alex Securing The Mvc Architecture Part Two
Whitepaper : Building an Efficient Microservices Architecture
Azure AWS real time-interview questions part 9
Designing distributed systems
Elastic APM: Combinalo con tus logs y métricas para una visibilidad completa
Elastic APM: Amplía tus logs y métricas para ver el panorama completo
eSobi Site Initiation
Ad

Similar to Scalable Service Architectures (20)

PPTX
Scalable service architectures @ VDB16
PPTX
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
PPT
The economies of scaling software - Abdel Remani
PPT
The Economies of Scaling Software
PDF
Software Architecture for Cloud Infrastructure
PDF
Tales Of The Black Knight - Keeping EverythingMe running
PDF
Microservices Architecture In The Real World: Mason Jones
PPTX
Resilience planning and how the empire strikes back
PDF
Session on scalability - by isaka traore - 19 may 2016 - rockstart
PPTX
Building-Scalable-Web-Applications.Presentation
PPTX
Expect the unexpected: Anticipate and prepare for failures in microservices b...
PPTX
.NET microservices with Azure Service Fabric
PPTX
Jeffrey Richter
PPTX
Designing Fault Tolerant Microservices
PDF
Azure and cloud design patterns
PDF
Architecting for Failures in micro services: patterns and lessons learned
PDF
Scalr: Setting Up Automated Scaling
PDF
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
PDF
Reliability and Resilience Patterns
PDF
Reliability Patterns for Distributed Applications
Scalable service architectures @ VDB16
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
The economies of scaling software - Abdel Remani
The Economies of Scaling Software
Software Architecture for Cloud Infrastructure
Tales Of The Black Knight - Keeping EverythingMe running
Microservices Architecture In The Real World: Mason Jones
Resilience planning and how the empire strikes back
Session on scalability - by isaka traore - 19 may 2016 - rockstart
Building-Scalable-Web-Applications.Presentation
Expect the unexpected: Anticipate and prepare for failures in microservices b...
.NET microservices with Azure Service Fabric
Jeffrey Richter
Designing Fault Tolerant Microservices
Azure and cloud design patterns
Architecting for Failures in micro services: patterns and lessons learned
Scalr: Setting Up Automated Scaling
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Reliability and Resilience Patterns
Reliability Patterns for Distributed Applications
Ad

More from Zoltán Németh (9)

PPTX
Reveal The Secrets of Your Videos
PPTX
Voxxed Days Belgrade 2017 - How not to do DevOps
PPTX
Content protection with UMS
PPTX
Implementing DevOps In Practice
PPTX
Building our own CDN
PDF
Culture @ Velocity UK
PPTX
On-demand real time transcoding
PPT
DB séma kezelés Liquibase-el
PPTX
Daemons in PHP
Reveal The Secrets of Your Videos
Voxxed Days Belgrade 2017 - How not to do DevOps
Content protection with UMS
Implementing DevOps In Practice
Building our own CDN
Culture @ Velocity UK
On-demand real time transcoding
DB séma kezelés Liquibase-el
Daemons in PHP

Recently uploaded (20)

PPTX
Mathew Digital SEO Checklist Guidlines 2025
PDF
The Ikigai Template _ Recalibrate How You Spend Your Time.pdf
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PDF
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
PDF
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
PDF
Alethe Consulting Corporate Profile and Solution Aproach
PPTX
Database Information System - Management Information System
PPTX
Layers_of_the_Earth_Grade7.pptx class by
PDF
The Evolution of Traditional to New Media .pdf
PPT
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPT
250152213-Excitation-SystemWERRT (1).ppt
PPTX
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
PDF
si manuel quezon at mga nagawa sa bansang pilipinas
PDF
simpleintnettestmetiaerl for the simple testint
PPTX
Cyber Hygine IN organizations in MSME or
PPT
12 Things That Make People Trust a Website Instantly
PDF
Exploring The Internet Of Things(IOT).ppt
PDF
Alethe Consulting Corporate Profile and Solution Aproach
DOCX
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...
Mathew Digital SEO Checklist Guidlines 2025
The Ikigai Template _ Recalibrate How You Spend Your Time.pdf
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
Alethe Consulting Corporate Profile and Solution Aproach
Database Information System - Management Information System
Layers_of_the_Earth_Grade7.pptx class by
The Evolution of Traditional to New Media .pdf
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
250152213-Excitation-SystemWERRT (1).ppt
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
si manuel quezon at mga nagawa sa bansang pilipinas
simpleintnettestmetiaerl for the simple testint
Cyber Hygine IN organizations in MSME or
12 Things That Make People Trust a Website Instantly
Exploring The Internet Of Things(IOT).ppt
Alethe Consulting Corporate Profile and Solution Aproach
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...

Scalable Service Architectures

  • 1. Scalable Service Architectures Lessons learned Zoltán Németh Engineering Manager, Core Systems
  • 2. Agenda  Our scalability experience  What is Scalability?  Requirements in detail  Tips and tools  Extras, Closing remarks
  • 6. Defining scalability Scalability is the ability to handle increased workload by repeatedly applying a costeffective strategy for extending a system’s capacity. (CMU paper, 2006) How well a solution to some problem will work when the size of the problem increases. When the size decreases, the solution must fit. (dictionary.com and Theo Schlossnagle, 2006)
  • 7. Self-contained service  Explicitly declare and isolate dependencies  Isolation from the outside system  Static linking  Do not rely on system packages
  • 8. Disposability  Maximize robustness with fast startup and graceful shutdown  Disposable processes  Graceful shutdown on SIGTERM  Handling sudden death: robust queue backend
  • 9. Startup and Shutdown  Automate all the things  Chef  Docker  Gold image based deployment  Immutable  Handling tasks before shutdown
  • 10. Backing Services  Treat backing services as attached resources  No distinction between local and third party services  Easily swap out resources  Export services via port binding  Become the backing service for another app
  • 11. Processes, concurrency  Stateless processes (not even sticky sessions)  Process types by work type  We <3 linux process  Shared-nothing  adding concurrency is safe  Process distribution spanning machines
  • 12. Statelessness  Store everything in a datastore  Aggregate data  Chandra  Scalable datastores  Redis  Cassandra  Aerospike  Handling user sessions
  • 13. Monitoring  Application state and metrics  Dashboards  Alerting  Health  Remove failing nodes  Capacity  Act on trends
  • 14. Monitoring  Metrics collecting  Graphite, New Relic  Self-aware checks  Cluster state  Zookeeper, Consul  Scaling decision types  Capacity amount  Graph derivative  App requests
  • 16. Load Balance and Resource Allocation  Load Balance: distribute tasks  Utilize machines efficiently  VM compatible apps  Flexibility  Adapting to available resources
  • 17. Load Balance  DNS or API  App level balance  Uniform entry point or proxy  Balance decisions  Load  Zookeeper state  Resource policies
  • 18. Service Separation  Failure is inevitable  Protect from failing components  Cascading failure  Fail fast  Decoupling  Asynchronous operations  Message queues
  • 19. Service Separation  Rate limiting  Circuit Breaker pattern  Stop cascading failure, allow recovery  Hystrix  Fail fast, fail silent  Service decoupling
  • 20. Extras  Debugging features  Logs  Clojure / JS consoles  Runtime configuration via env  Scaling API  Integrating several cloud providers  Automatic start / stop
  • 21. Reading  Scalable Internet Architectures by Theo Schlossnagle  The 12-factor App: http://12factor.net/  Carnegie Mellon Paper: http://www.sei.cmu.edu/reports/06tn012.pdf  Circuit Breaker: http://martinfowler.com/bliki/CircuitBreaker.html  Release It! by Michael T. Nygard

Editor's Notes

  • #2: A bit of Ustream intro
  • #3: Definition Requirements coming from 12-factor, and some added by us Some more detail and tools on selected requirements
  • #4: 30 day viewer graph. Clear peaks -> need for scaling
  • #5: Quick description of the streaming stack, roles of components, how they require scaling - Transcontroller/transcoder scaling - UMS scaling
  • #6: Quick description of the streaming stack, roles of components, how they require scaling - Transcontroller/transcoder scaling - UMS scaling
  • #7: Carnegie Mellon University paper by Charles B. Weinstock, John B. Goodenough: On System Scalability LINFO: The Linux Information Project http://www.linfo.org/
  • #8: Example: calling imagemagick or curl from code – they might be there or might not be Bundle everything into the app instead
  • #9: Disposable process: they can be started or stopped at a moment’s notice For a web process, graceful shutdown is achieved by ceasing to listen on the service port (thereby refusing any new requests), allowing any current requests to finish, and then exiting. Implicit in this model is that HTTP requests are short (no more than a few seconds), or in the case of long polling, the client should seamlessly attempt to reconnect when the connection is lost. For a worker process, graceful shutdown is achieved by returning the current job to the work queue.
  • #10: Docker: build images from dockerfile, deploy from repository Tasks before shutdown: moving jobs, log collection, sleep
  • #11: A backing service is any service the app consumes over the network as part of its normal operation. Examples include datastores (such as MySQL or CouchDB), messaging/queueing systems (such as RabbitMQ or Beanstalkd), SMTP services for outbound email (such as Postfix), and caching systems (such as Memcached). Put a resource locator in the config only – environment variables Example: Easily swap out a local mysql to a remote service The app does not rely on runtime injection of a webserver into the execution environment to create a web-facing service. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port. One app can become the backing service for another app, by providing the URL to the backing app as a resource handle in the config for the consuming app
  • #12: Handle diverse workloads by assigning each type of work to a process type. For example, HTTP requests may be handled by a web process, and long-running background tasks handled by a worker process An individual VM can only grow so large (vertical scale), so the application must also be able to span multiple processes running on multiple physical machines.
  • #13: Aggregate everything within the app and write it out in bulk – careful about write frequency, must not lose too many data on a crash Redis: scales reads, write problematic Cassandra: quick scaling questionable Aerospike: scales reads and writes, working together with their eng team User sessions: persistent connection, NIO+
  • #14: Alerting -> openduty Two important groups: Health vs capacity
  • #15: Report everything to graphite, constantly check graph trends automatically Apps are self-aware, they know their health App instances report into Zookeeper and thus know about each other Central logic can request resource based on capacity or graph, app can request based on self-check or zookeeper Zookeeper, Consul: miért, mik az előnyei
  • #17: load balancing distributes workloads across multiple computing resources Flexibility: can increase or decrease its own size, example: Threadpools Adapting to CPU, RAM, disk, network
  • #18: App level: transcontroller selects transcoder App level balance with proxy can be SPOF, careful Resource policies: even distribution, keep large chunks free for possible large tasks (transcoder use case), group requests together on some attribute (pro, etc)
  • #19: Failure inevitable because: large numbers, hw issues, independent network Decoupling: serving one request should not wait on others
  • #20: Hystrix by Netflix 2011/12 Circuit Breaker: Martin Fowler post from 2014 Service decoupling example: inserting layers between DB and UMS -> RGW. Then another layer between RGW and UMS -> Queue
  • #21: Logs: logs as stream / stdout (factor #9), collect / transport / process Scaling API: Other considerations: price, network line to the cloud provider, instance type (spot vs normal)