SlideShare a Scribd company logo
MANAGING GROWTH
SCALING TEAMS, PROCESSES, ARCHITECTURES
Lorenzo Alberton, CTO @ DataSift
MEST, Accra 10 December 2017
LORENZO ALBERTON
Chief Technology Officer, DataSift
http://alberton.info
@lorenzoalberton
SCALABLE ARCHITECTURES http://bit.ly/scaleds
SCALABILITY IS ABOUT…
People
Technology
ProcessesTRUE
FOUNDATION
PART 1.
PEOPLE
Staffing, Roles,
Management, Teams
CULTURE
➤ Treat people as volunteers (*)
➤ Lead by living the values you
promote
➤ Respect, collaboration
➤ Promote fun in the workplace
➤ Culture of safety at work (**)
(*) Peter Drucker
(**) Google, Project Aristotle
EFFECTIVE TEAMS
PROJECTARISTOTLE(2012)
Psychological safety: team climate
characterised by interpersonal trust
and mutual respect in which people
are comfortable being themselves.
Feeling free to share the things that
scare us without fear of
recriminations.
Behaviours: conversational turn-
taking and empathy.
https://www.nytimes.com/2016/02/28/magazine/what-google-learned-from-its-quest-to-build-the-perfect-team.html
TEAMS VS. INDIVIDUAL CONTRIBUTORS
➤ Beware of toxic people
➤ Value communication and
team work over super-heroes
(*) Sunday afternoon test
STAFFING
Don’t hire
experts
Technologies come and go
Focus more on people with passion
and less on people with specific skills
TEAM SIZE
➤ Never underestimate the
power of a small team
➤ Small teams force alignment
and focus
➤ Bigger teams need an insane
amount of overhead
➤ Parkinson's Law: “Work
expands to fill the time available
for its completion”
work that keeps a person busy
but has little value in itself
TEAM STRUCTURE
No artificial boundaries around languages or skills
Try cross-functional teams 

(less friction, better end to end collaboration, project ownership)
MIDDLE-MANAGEMENT CURSE
Mistakes:
➤ Prematurely re-organise for scale
(deep hierarchy, over-
specialisation)
➤ Process managers (factory
mentality) vs Problem solvers
➤ Micromanagement
➤ Non-engineering culture
➤ 1-on-1s as calendar-filler
➤ Not being “on the ground”
➤ Over-confidence in tooling
➤ OTOH, coordination can be hard
PART 2.
PROCESSES
How to make day to day
operations smooth
WHY ARE PROCESSES CRITICAL?
Ease management
of teams/projects
Standardise actions
in repetitive tasks
Reduce mundane
decisions to focus
on grander ideas
Allow the team to
react quickly to crisis
➤ A process shouldn’t exist for the sake of it
➤ Introduce processes gradually, only keep what works
➤ Don’t put too much confidence in tools alone to fix issues
EXAMPLE PROCESSES
➤ Development methodology
➤ Risk / Benefit analysis
➤ Prioritisation / Planning
➤ Design and code reviews
➤ Evaluating headroom / scale
➤ Load / Stress testing
➤ Test automation
➤ Deployment automation
➤ Release checklists
➤ Risk assessment/management
➤ Blameless postmortems
PROMOTING SYSTEMS TO PROD
➤ Code reviews
➤ Dev, Test, Stage and Live
environments
➤ Manual and automated QA
processes
➤ Performance and stress testing
➤ Release check lists (runbook)
➤ Instrumentation checks
➤ Testing roll-back capability
Protection from significant failures
BARRIER CONDITIONS
DESIGN AND CODE REVIEWS
➤ Promote collaboration
➤ Validate ideas, assess risk, detect
flaws, simplify the solution
➤ Reason about behaviour before
coding
DAILY STAND-UPS
➤ Important for knowledge
sharing, collaboration,
alignment
CONTROLLING CHANGE: RISK ESTIMATION
http://dilbert.com/strips/comic/2008-05-08/
➤ Limit / log the impact of changes
➤ Assess risk methodologies:
• Gut feeling / finger in the air
• Semaphore method
• Failure Mode and Effect Analysis
RISK MANAGEMENT
➤ Risk is cumulative
➤ Determine limits and
tolerance
➤ Stress, long hours, peer
pressure can multiply risk
WHEN/WHAT TO SCALE: DETERMINING HEADROOM
Capacity
Current Load
Why?
Budget plan
Prioritisation
Hiring plan
Determine starting point, remaining capacity, expected demand
LOAD TESTING
➤ Identify, document and
eliminate bottlenecks through
a strict controlled process of
measurement and analysis
➤ Measure system’s response
and stability
➤ Verify the app can meet the
desired performance
objectives (SLA)
➤ Establish success criteria, test
environment, tests, what
needs to be monitored, what
data needs to be collected
STRESS TESTING
➤ Determine the app’s stability
when subjected to above-
normal loads
➤ Verify the app’s behaviour
when close to the breaking
point
➤ Positive testing: progressively
increase load to overwhelm
the system’s resources
➤ Negative testing: take away
resources (memory, threads,
connections) to test the
application recoverability
PART 3.
TECHNOLOGY
Architecting Robust,
Scalable Solutions
DO NOT SCALE UNTIL YOU CAN’T AVOID IT ANYMORE
➤ “Go meet your people. Do things that don’t scale.” (Paul
Graham to AirBNB’s founders)
➤ Solve for specific problems
➤ Don’t generalise until you rebuilt something for the 3rd time
➤ Don’t over-engineer the solution
➤ Automate repetitive and error-prone tasks
➤ Avoid complicating things
✴ Phone system
MVP APPROACH
➤ Test ideas before spending a
year building something you
haven’t proven in the market
first
➤ Fake it till you make it
➤ Example: Zappos
ARCHITECTURAL / DESIGN PRINCIPLES
N + 1 nodes for rollback to be disabled
(feature flags)
to be monitored
for multiple live
systems/sites
use mature
technology
asynchronous
communications
stateless
systems
+1
buy when
non core
FAULT-TOLERANT STRUCTURES
➤ Swim lanes: isolate and limit the
impacts of failure within the
system by segmenting pipelines
➤ Barrier and Guide (shard)
➤ Increase availability
➤ Make incidents easier to detect,
identify and resolve

➤ Favour the transactions making
the company money first
➤ Isolate functions causing repetitive
problems (or busy tenants)
➤ Consider the natural layout or
topology of the site
SCALING IN DIFFERENT DIRECTIONS
x
y z
AKF Scaling Cube, “The Art of Scalability”, M.L.Abbott, M.T.Fisher
cloning of services and data
without any bias
(e.g. more serving nodes in a worker
pool where any node can do the work)
separation of work
responsibility by type of data
or type of work
(different specialised worker
pools)
separation of work by
customer or requestor
(dedicated highly specialised
worker pools)
SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS
x
cloning of entities
or data - unbiased
distribution of work
y
separation of work
by activity or data
z
separation of work
by person for whom
the work is done
web site

(mirror 1)
web site

(mirror 2)
search 

server
shopping
cart server
premium site
standard site
LB
SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS
x mirroring
+ scale transactions
- scale data
y split by service
+ scale isolation
+ scale function data
- scale customer data
z
split by need /
location / value
+ scale isolation
+ scale customer data
- scale function data
SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA
x
data cloning
(replication /
clustering) + load
balancer
y
split different things
by service / resource /
data affinity
z
split similar things
by modulus / hash-
based lookups
copy 1 copy 2 copy 3
ABC DEF GHI
SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA
x
data cloning
(replication /
clustering) + load
balancer
+ easy to implement
+ scale transaction volume
+ useful in case of high read to write ratio
- scale data size and growth
y
split different things
by service / resource /
data affinity
+ fault isolation
+ reduce query time
- more difficult
- data migration
z
split similar things
by modulus / hash-
based lookups
+ uniformly balanced demand
+ fault isolation
+ scale data and transactions
- more costly
QUEUES
➤ Asynchronous communication
➤ Workload distribution
➤ Failure isolation
MESSAGE QUEUES AS BUFFERS (ASYNC COMM - DECOUPLING)
CP
Unpredictable load spikes
CP
Load normalisation / smoothing
Batching ⇒ higher throughput
source /
producer
sink /
consumer
WORKLOAD DISTRIBUTION - LOAD BALANCING
Consumer 1
Consumer 2
Consumer 3
Producer
push pull
pull
pull
MULTIPLEXING
pull
Consumer
fair-queuing:
R1, R4, R5,
R2, R6, R3
Producer 1
Producer 2
Producer 3
push R4
push R1, R2, R3
push R5, R6
HIGH AVAILABILITY (PUB-SUB / BROADCAST)
Listener 1
Listener 2
Listener 3
[Broadcast]
Publisher 1
Publisher 2
[Dynamic Subscriptions]
BOUND YOUR QUEUE SIZE - APPLY BACK PRESSURE
CP
MONITORING
➤ Measure all the things!
➤ Think about what metrics to
track when you design your
app: system/app/user level
➤ Engage with Ops / QA early
on in the design phase
➤ Invest in a good monitoring
solution
➤ Data integrity checks (bucket
analysis, statistical analysis)
➤ Alerting and monitoring
dashboards should be intuitive
39
LOOK! RIB CAGES!
INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
LOOK! MONITORS!
INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
OTHER SCALING TIPS
➤ Use caching aggressively (CDNs,
app & object caches)
➤ Design to scale out horizontally
➤ Simplify scope, design,
implementation: lean == fast
➤ Know latencies
➤ Relax temporal constraints
➤ Discuss and Learn from mistakes
➤ Design for fault tolerance,
graceful failure, and resilience
➤ Avoid SPOFs
➤ Avoid or distribute state
➤ Be competent
REFERENCES
http://www.slideshare.net/quipo/the-art-of-
scalability-managing-growth
http://www.infoq.com/presentations/Simple-
Made-Easy-QCon-London-2012
http://www.slideshare.net/postwait/scalable-
internet-architecture
http://bit.ly/IJKwuc
http://agile.dzone.com/news/approaches-
organizational
https://bitly.com/vCSd49
M. L. Abbot, M. T. Fisher,
“The Art Of Scalability”,
Addison Wesley
http://theartofscalability.com/
http://alberton.info/talks
@lorenzoalberton
lorenzo@datasift.com
THANK YOU!
/in/lorenzoalberton

More Related Content

Similar to Scaling teams, processes and architectures (20)

PDF
What needs to be true? Patterns of engineering agility
Andy Norton
 
PDF
Scaling Product Thinking with SAFe - The Secret Sauce for Meaningful Product ...
Cprime
 
PPTX
Practical agile TechExeter
Ian Ames
 
PPTX
Practical Agile. Lessons learned the hard way on our journey building digita...
TechExeter
 
PDF
Scaling humans
Charles Burgess
 
PDF
Paving the road to production
Matthew Reynolds
 
PPTX
Scaling Technology Organizations
Sergey Sundukovskiy
 
PDF
Scale your Software development process while scaling your team
Florian Motlik
 
PDF
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Randy Shoup
 
PPTX
Continuous Deployment - Case Study at WIX
AgileSparks
 
PPT
Scaling Online Game Development
Maciej Mróz
 
PPTX
Moving Fast At Scale
Randy Shoup
 
PPTX
I want to be an efficient developper. Mix-IT version
Quentin Adam
 
PPTX
A CTO's Guide to Scaling Organizations
Randy Shoup
 
PPTX
The challenges of live events scalability
Guy Tomer
 
PPTX
Building-Scalable-Web-Applications.Presentation
Ozias Rondon
 
PPTX
6 Steps To Awesome - Coosto @DevOnSummit March 2018
Arjen de Ruiter
 
PPT
The economies of scaling software - Abdel Remani
jaxconf
 
PDF
How to Build a Robust Web Application in 2024.
Cuneiform Consulting Pvt Ltd.
 
PPT
Scaling Online Game Development
GameDesire Company
 
What needs to be true? Patterns of engineering agility
Andy Norton
 
Scaling Product Thinking with SAFe - The Secret Sauce for Meaningful Product ...
Cprime
 
Practical agile TechExeter
Ian Ames
 
Practical Agile. Lessons learned the hard way on our journey building digita...
TechExeter
 
Scaling humans
Charles Burgess
 
Paving the road to production
Matthew Reynolds
 
Scaling Technology Organizations
Sergey Sundukovskiy
 
Scale your Software development process while scaling your team
Florian Motlik
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Randy Shoup
 
Continuous Deployment - Case Study at WIX
AgileSparks
 
Scaling Online Game Development
Maciej Mróz
 
Moving Fast At Scale
Randy Shoup
 
I want to be an efficient developper. Mix-IT version
Quentin Adam
 
A CTO's Guide to Scaling Organizations
Randy Shoup
 
The challenges of live events scalability
Guy Tomer
 
Building-Scalable-Web-Applications.Presentation
Ozias Rondon
 
6 Steps To Awesome - Coosto @DevOnSummit March 2018
Arjen de Ruiter
 
The economies of scaling software - Abdel Remani
jaxconf
 
How to Build a Robust Web Application in 2024.
Cuneiform Consulting Pvt Ltd.
 
Scaling Online Game Development
GameDesire Company
 

More from Lorenzo Alberton (7)

PDF
Monitoring at scale - Intuitive dashboard design
Lorenzo Alberton
 
KEY
Scalable Architectures - Taming the Twitter Firehose
Lorenzo Alberton
 
KEY
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Lorenzo Alberton
 
KEY
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
KEY
The Art of Scalability - Managing growth
Lorenzo Alberton
 
KEY
Graphs in the Database: Rdbms In The Social Networks Age
Lorenzo Alberton
 
KEY
Trees In The Database - Advanced data structures
Lorenzo Alberton
 
Monitoring at scale - Intuitive dashboard design
Lorenzo Alberton
 
Scalable Architectures - Taming the Twitter Firehose
Lorenzo Alberton
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Lorenzo Alberton
 
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
The Art of Scalability - Managing growth
Lorenzo Alberton
 
Graphs in the Database: Rdbms In The Social Networks Age
Lorenzo Alberton
 
Trees In The Database - Advanced data structures
Lorenzo Alberton
 
Ad

Recently uploaded (16)

PDF
Criminology_and_Security_Studies_Syllabus.pdf
MOHAMED HERSI FARAH
 
PPTX
Change Management Theories: Basic Intro about famous theories
Ujjwaal G
 
PDF
CISSP Domain 1: Security and Risk Management
VICTOR MAESTRE RAMIREZ
 
PDF
Can an Inspection Test Plan be modified during a project?
Writegenic AI
 
PDF
CISSP Domain 2: Asset Security - InfoSec
VICTOR MAESTRE RAMIREZ
 
PDF
Labirintos morais: um estudo sobre ética
Ivomar Schuler da Costa
 
PDF
SpatzAI Micro-conflict Resolution Toolkit
Desmond Sherlock
 
PPTX
Caleb followed God with his whole heart.pptx
RaraSolliborDumalian
 
PPTX
Grounding_Hypotheses_Presentation_Updated.pptx
truefollower1
 
PDF
feedback fallacy in Human Resources Management
raddddzzzzzz
 
PDF
CISSP Domain 4: Communication and Network Security
VICTOR MAESTRE RAMIREZ
 
DOCX
Process Confirmation and Product Tracecability Project Report.docx
himanshupersevering
 
PPTX
14 Ways in 14 Days: 14 Habits' for Highly Effective VBN'ITE-Pathshala
Prit Shah
 
PDF
CISSP Domain 3: Security Architecture and Engineering
VICTOR MAESTRE RAMIREZ
 
PDF
SpatzAI Micro-Conflict Resolution Toolkit
Desmond Sherlock
 
PDF
SpatzAI Micro-conflict Resolution Toolkit
Desmond Sherlock
 
Criminology_and_Security_Studies_Syllabus.pdf
MOHAMED HERSI FARAH
 
Change Management Theories: Basic Intro about famous theories
Ujjwaal G
 
CISSP Domain 1: Security and Risk Management
VICTOR MAESTRE RAMIREZ
 
Can an Inspection Test Plan be modified during a project?
Writegenic AI
 
CISSP Domain 2: Asset Security - InfoSec
VICTOR MAESTRE RAMIREZ
 
Labirintos morais: um estudo sobre ética
Ivomar Schuler da Costa
 
SpatzAI Micro-conflict Resolution Toolkit
Desmond Sherlock
 
Caleb followed God with his whole heart.pptx
RaraSolliborDumalian
 
Grounding_Hypotheses_Presentation_Updated.pptx
truefollower1
 
feedback fallacy in Human Resources Management
raddddzzzzzz
 
CISSP Domain 4: Communication and Network Security
VICTOR MAESTRE RAMIREZ
 
Process Confirmation and Product Tracecability Project Report.docx
himanshupersevering
 
14 Ways in 14 Days: 14 Habits' for Highly Effective VBN'ITE-Pathshala
Prit Shah
 
CISSP Domain 3: Security Architecture and Engineering
VICTOR MAESTRE RAMIREZ
 
SpatzAI Micro-Conflict Resolution Toolkit
Desmond Sherlock
 
SpatzAI Micro-conflict Resolution Toolkit
Desmond Sherlock
 
Ad

Scaling teams, processes and architectures

  • 1. MANAGING GROWTH SCALING TEAMS, PROCESSES, ARCHITECTURES Lorenzo Alberton, CTO @ DataSift MEST, Accra 10 December 2017
  • 2. LORENZO ALBERTON Chief Technology Officer, DataSift http://alberton.info @lorenzoalberton
  • 6. CULTURE ➤ Treat people as volunteers (*) ➤ Lead by living the values you promote ➤ Respect, collaboration ➤ Promote fun in the workplace ➤ Culture of safety at work (**) (*) Peter Drucker (**) Google, Project Aristotle
  • 7. EFFECTIVE TEAMS PROJECTARISTOTLE(2012) Psychological safety: team climate characterised by interpersonal trust and mutual respect in which people are comfortable being themselves. Feeling free to share the things that scare us without fear of recriminations. Behaviours: conversational turn- taking and empathy. https://www.nytimes.com/2016/02/28/magazine/what-google-learned-from-its-quest-to-build-the-perfect-team.html
  • 8. TEAMS VS. INDIVIDUAL CONTRIBUTORS ➤ Beware of toxic people ➤ Value communication and team work over super-heroes (*) Sunday afternoon test
  • 9. STAFFING Don’t hire experts Technologies come and go Focus more on people with passion and less on people with specific skills
  • 10. TEAM SIZE ➤ Never underestimate the power of a small team ➤ Small teams force alignment and focus ➤ Bigger teams need an insane amount of overhead ➤ Parkinson's Law: “Work expands to fill the time available for its completion” work that keeps a person busy but has little value in itself
  • 11. TEAM STRUCTURE No artificial boundaries around languages or skills Try cross-functional teams 
 (less friction, better end to end collaboration, project ownership)
  • 12. MIDDLE-MANAGEMENT CURSE Mistakes: ➤ Prematurely re-organise for scale (deep hierarchy, over- specialisation) ➤ Process managers (factory mentality) vs Problem solvers ➤ Micromanagement ➤ Non-engineering culture ➤ 1-on-1s as calendar-filler ➤ Not being “on the ground” ➤ Over-confidence in tooling ➤ OTOH, coordination can be hard
  • 13. PART 2. PROCESSES How to make day to day operations smooth
  • 14. WHY ARE PROCESSES CRITICAL? Ease management of teams/projects Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis ➤ A process shouldn’t exist for the sake of it ➤ Introduce processes gradually, only keep what works ➤ Don’t put too much confidence in tools alone to fix issues
  • 15. EXAMPLE PROCESSES ➤ Development methodology ➤ Risk / Benefit analysis ➤ Prioritisation / Planning ➤ Design and code reviews ➤ Evaluating headroom / scale ➤ Load / Stress testing ➤ Test automation ➤ Deployment automation ➤ Release checklists ➤ Risk assessment/management ➤ Blameless postmortems
  • 16. PROMOTING SYSTEMS TO PROD ➤ Code reviews ➤ Dev, Test, Stage and Live environments ➤ Manual and automated QA processes ➤ Performance and stress testing ➤ Release check lists (runbook) ➤ Instrumentation checks ➤ Testing roll-back capability Protection from significant failures BARRIER CONDITIONS
  • 17. DESIGN AND CODE REVIEWS ➤ Promote collaboration ➤ Validate ideas, assess risk, detect flaws, simplify the solution ➤ Reason about behaviour before coding DAILY STAND-UPS ➤ Important for knowledge sharing, collaboration, alignment
  • 18. CONTROLLING CHANGE: RISK ESTIMATION http://dilbert.com/strips/comic/2008-05-08/ ➤ Limit / log the impact of changes ➤ Assess risk methodologies: • Gut feeling / finger in the air • Semaphore method • Failure Mode and Effect Analysis
  • 19. RISK MANAGEMENT ➤ Risk is cumulative ➤ Determine limits and tolerance ➤ Stress, long hours, peer pressure can multiply risk
  • 20. WHEN/WHAT TO SCALE: DETERMINING HEADROOM Capacity Current Load Why? Budget plan Prioritisation Hiring plan Determine starting point, remaining capacity, expected demand
  • 21. LOAD TESTING ➤ Identify, document and eliminate bottlenecks through a strict controlled process of measurement and analysis ➤ Measure system’s response and stability ➤ Verify the app can meet the desired performance objectives (SLA) ➤ Establish success criteria, test environment, tests, what needs to be monitored, what data needs to be collected
  • 22. STRESS TESTING ➤ Determine the app’s stability when subjected to above- normal loads ➤ Verify the app’s behaviour when close to the breaking point ➤ Positive testing: progressively increase load to overwhelm the system’s resources ➤ Negative testing: take away resources (memory, threads, connections) to test the application recoverability
  • 24. DO NOT SCALE UNTIL YOU CAN’T AVOID IT ANYMORE ➤ “Go meet your people. Do things that don’t scale.” (Paul Graham to AirBNB’s founders) ➤ Solve for specific problems ➤ Don’t generalise until you rebuilt something for the 3rd time ➤ Don’t over-engineer the solution ➤ Automate repetitive and error-prone tasks ➤ Avoid complicating things ✴ Phone system
  • 25. MVP APPROACH ➤ Test ideas before spending a year building something you haven’t proven in the market first ➤ Fake it till you make it ➤ Example: Zappos
  • 26. ARCHITECTURAL / DESIGN PRINCIPLES N + 1 nodes for rollback to be disabled (feature flags) to be monitored for multiple live systems/sites use mature technology asynchronous communications stateless systems +1 buy when non core
  • 27. FAULT-TOLERANT STRUCTURES ➤ Swim lanes: isolate and limit the impacts of failure within the system by segmenting pipelines ➤ Barrier and Guide (shard) ➤ Increase availability ➤ Make incidents easier to detect, identify and resolve
 ➤ Favour the transactions making the company money first ➤ Isolate functions causing repetitive problems (or busy tenants) ➤ Consider the natural layout or topology of the site
  • 28. SCALING IN DIFFERENT DIRECTIONS x y z AKF Scaling Cube, “The Art of Scalability”, M.L.Abbott, M.T.Fisher cloning of services and data without any bias (e.g. more serving nodes in a worker pool where any node can do the work) separation of work responsibility by type of data or type of work (different specialised worker pools) separation of work by customer or requestor (dedicated highly specialised worker pools)
  • 29. SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS x cloning of entities or data - unbiased distribution of work y separation of work by activity or data z separation of work by person for whom the work is done web site
 (mirror 1) web site
 (mirror 2) search 
 server shopping cart server premium site standard site LB
  • 30. SCALING IN DIFFERENT DIRECTIONS - 1. SCALING WORK / APPS x mirroring + scale transactions - scale data y split by service + scale isolation + scale function data - scale customer data z split by need / location / value + scale isolation + scale customer data - scale function data
  • 31. SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA x data cloning (replication / clustering) + load balancer y split different things by service / resource / data affinity z split similar things by modulus / hash- based lookups copy 1 copy 2 copy 3 ABC DEF GHI
  • 32. SCALING IN DIFFERENT DIRECTIONS - 2. SCALING DATA x data cloning (replication / clustering) + load balancer + easy to implement + scale transaction volume + useful in case of high read to write ratio - scale data size and growth y split different things by service / resource / data affinity + fault isolation + reduce query time - more difficult - data migration z split similar things by modulus / hash- based lookups + uniformly balanced demand + fault isolation + scale data and transactions - more costly
  • 33. QUEUES ➤ Asynchronous communication ➤ Workload distribution ➤ Failure isolation
  • 34. MESSAGE QUEUES AS BUFFERS (ASYNC COMM - DECOUPLING) CP Unpredictable load spikes CP Load normalisation / smoothing Batching ⇒ higher throughput source / producer sink / consumer
  • 35. WORKLOAD DISTRIBUTION - LOAD BALANCING Consumer 1 Consumer 2 Consumer 3 Producer push pull pull pull
  • 36. MULTIPLEXING pull Consumer fair-queuing: R1, R4, R5, R2, R6, R3 Producer 1 Producer 2 Producer 3 push R4 push R1, R2, R3 push R5, R6
  • 37. HIGH AVAILABILITY (PUB-SUB / BROADCAST) Listener 1 Listener 2 Listener 3 [Broadcast] Publisher 1 Publisher 2 [Dynamic Subscriptions]
  • 38. BOUND YOUR QUEUE SIZE - APPLY BACK PRESSURE CP
  • 39. MONITORING ➤ Measure all the things! ➤ Think about what metrics to track when you design your app: system/app/user level ➤ Engage with Ops / QA early on in the design phase ➤ Invest in a good monitoring solution ➤ Data integrity checks (bucket analysis, statistical analysis) ➤ Alerting and monitoring dashboards should be intuitive 39
  • 40. LOOK! RIB CAGES! INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
  • 42. LOOK! MONITORS! INTUITIVE MONITORING DASHBOARDS: LIVE HEAT-MAPS
  • 43. OTHER SCALING TIPS ➤ Use caching aggressively (CDNs, app & object caches) ➤ Design to scale out horizontally ➤ Simplify scope, design, implementation: lean == fast ➤ Know latencies ➤ Relax temporal constraints ➤ Discuss and Learn from mistakes ➤ Design for fault tolerance, graceful failure, and resilience ➤ Avoid SPOFs ➤ Avoid or distribute state ➤ Be competent