SlideShare a Scribd company logo
Monitoring Microservices &
Containers: A Challenge
Adrian Cockcroft @adrianco
Technology Fellow - Battery Ventures
May 2015
Monitoring
!
Update of my monitoring rules from Monitorama 2014
Rule #1: Spend more time working on code
that analyzes the meaning of metrics, than
code that collects, moves, stores and
displays metrics.
Rule #2: Metric to display latency needs to
be less than human attention span (~10s)
Rule #3: Validate that your measurement
system has enough accuracy and precision.
Collect histograms of response time.
Rule #4: Monitoring systems need to be
more available and scalable than the
systems being monitored.
Rule #5: Optimize for distributed,
ephemeral, cloud native, containerized
microservices.
Rule #6: Fit metrics to models to understand
relationships. (New rule)
Gluecon Monitoring Microservices and Containers: A Challenge
Container
Instance
e.g. Machine
failure affects
all instances
and containers
inside itZone/DC
Region
Microservice
Model Infrastructure as a
Containment Hierarchy
Machine
Many tools use a naming scheme to imply this model, but
most can’t reason about the relationships
Gluecon Monitoring Microservices and Containers: A Challenge
Request
Model Applications and Networks
as a Dataflow Graph
APM Tools often model these as business transactions
Microservice Zone/DC
Region
Developer Developer
Model Deployment Ownership
and Support
Developer Developer
Developer Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Developer Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Monitoring
Tools
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Monitoring
Tools
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Site
Reliability
Monitoring
Tools
Availability
Metrics
99.95% customer
success rate
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Manager Manager
Site
Reliability
Monitoring
Tools
Availability
Metrics
99.95% customer
success rate
DeveloperDeveloper Developer
Model Deployment Ownership
and Support
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Micro
service
Developer Developer
Manager Manager
VP
Engineering
Site
Reliability
Monitoring
Tools
Availability
Metrics
99.95% customer
success rate
Infrastructure, flow and ownership models
are orthogonal and need to be linked to
make sense of the metrics
Monitoring Rules by @adrianco
1. Spend more time on analysis than data collection and display
2. Reduce key business metric latency to less than 10s
3. Validate your measurement system, use histograms
4. Be more available and scalable than the services being monitored
5. Optimize for distributed, ephemeral cloud native applications
6. Fit metrics to models to understand relationships
Microservices
Microservices
@ideavist
A Microservice Definition
!
Loosely coupled service oriented
architecture with bounded contexts
A Microservice Definition
!
Loosely coupled service oriented
architecture with bounded contexts
If every service has to be
updated at the same time
it’s not loosely coupled
A Microservice Definition
!
Loosely coupled service oriented
architecture with bounded contexts
If every service has to be
updated at the same time
it’s not loosely coupled
If you have to know too much about surrounding
services you don’t have a bounded context. See the
Domain Driven Design book by Eric Evans.
Complexity
Monolithic apps have unlimited invisible
internal dependencies
!
Vastly more complex than explicit visible
microservice dependencies
Speed
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Container Deployments
• Deploy in seconds
• Live for minutes/hours
Speeding Up Deployments
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Container Deployments
• Deploy in seconds
• Live for minutes/hours
AWS Lambda Events
• Respond in milliseconds
• Live for seconds
Speeding Up Deployments
Measuring CPU usage once a minute makes no sense for containers…
Coping with rate of change is a big challenge for monitoring tools.
Datacenter Snowflakes
• Deploy in months
• Live for years
Virtualized and Cloud
• Deploy in minutes
• Live for weeks
Container Deployments
• Deploy in seconds
• Live for minutes/hours
AWS Lambda Events
• Respond in milliseconds
• Live for seconds
Scale
A Possible Hierarchy
Continents
Regions
Zones
Services
Versions
Containers
Instances
How Many?
3 to 5
2-4 per Continent
1-5 per Region
100’s per Zone
Many per Service
1000’s per Version
10,000’s
It’s much more challenging
than just a large number of
machines
Flow
Some tools can show
the request flow
across a few services
But interesting
architectures have a
lot of microservices!
Flow visualization is
a challenge.
See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
Failures
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
Zone partition/failure
What should you do?
What should monitors show?
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
Zone partition/failure
What should you do?
What should monitors show?
By design, everything works
with 2 of 3 zones running.
This is not an outage, inform
but don’t touch anything!
Halt deployments perhaps?
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash Data
Access Layer
Priam Cassandra
Datastore
Simple NetflixOSS
style microservices
architecture on three
AWS Availability Zones
Zone partition/failure
What should you do?
What should monitors show?
By design, everything works
with 2 of 3 zones running.
This is not an outage, inform
but don’t touch anything!
Halt deployments perhaps?
Challenge: understand and
communicate common
microservice failure patterns.
Testing
Testing monitoring tools at scale
gets expensive quickly…
Simulation
Simulated Microservices
Model and visualize microservices
Simulate interesting architectures
Generate large scale configurations
Eventually stress test real tools
!
See github.com/adrianco/spigo
Simulate Protocol Interactions in Go
Visualize with D3
ELB Load Balancer
Zuul API Proxy
Karyon
Business
Logic
Staash
Data
Access
Layer
Priam Cassandra
Datastore
Three
Availability
Zones
netflixoss.go architecture
!!!!!!!!!asgard.Create(cname, asgard.PriamCassandraPkg, regions, priamCassandracount, "eureka", cname)
asgard.Create(tname, asgard.StaashPkg, regions, staashcount, cname)
asgard.Create(jname, asgard.KaryonPkg, regions, javacount, tname)
asgard.Create(nname, asgard.KaryonPkg, regions, nodecount, jname)
asgard.Create(zuname, asgard.ZuulPkg, regions, zuulcount, nname)
asgard.Create(elbname, asgard.ElbPkg, regions, 0, zuname)
asgard.Run(asgard.Create(dns, asgard.DenominatorPkg, 0, 0, elbname), jname) // victimize a javaweb
Tooling
New tier
name
Tier
package
Region
count: 1
Node
count
List of tier
dependencies
Run and log results to json
$ spigo -a netflixoss -d 10 -j
2015/05/21 00:05:32 netflixoss: scaling to 100%
2015/05/21 00:05:32 netflixoss.edda: starting
2015/05/21 00:05:32 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: starting
2015/05/21 00:05:32 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: starting
2015/05/21 00:05:32 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: starting
2015/05/21 00:05:32 netflixoss.*.*.www.denominator.www0 activity rate 10ms
2015/05/21 00:05:37 chaosmonkey delete: netflixoss.us-east-1.zoneC.javaweb.karyon.javaweb14
2015/05/21 00:05:42 asgard: Shutdown
2015/05/21 00:05:42 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: closing
2015/05/21 00:05:42 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: closing
2015/05/21 00:05:42 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: closing
2015/05/21 00:05:42 spigo: complete
2015/05/21 00:05:42 netflixoss.edda: closing
10 sec
run time
edda.go logs
config to json
eureka.go
service
registry per
zone
Chaos
monkey
victim!
Simianviz from json logs
http://simianviz.divshot.io/netflixoss/1
ELB splits
traffic over
zones in
single region
microservices
Cassandra
Cluster
Six regions
Big thanks to @kurtiskemple
Why Build Spigo?
Generate test microservice configurations at scale
Stress monitoring tools and simulated game day training
!
Eventually (i.e. not implemented yet)
Dynamically vary configuration: autoscale, code push
Chaos gorilla for zone, region failures and partitions
Websocket connection between spigo and simianviz display
!
My challenge to you:
Build your architecture in Spigo.
Stress monitoring tools with it.
Help fix monitoring for microservices!
!
@mgroeniger
Questions?
Disclosure: some of the companies mentioned may be Battery Ventures Portfolio Companies
See www.battery.com for a list of portfolio investments
● Microservices Challenges
● Speed and Scale
● Flow and Failures
● Testing and Simulation
!
● Battery Ventures http://www.battery.com
● Adrian’s Tweets @adrianco and Blog http://perfcap.blogspot.com
● Slideshare http://slideshare.com/adriancockcroft
● Github http://github.com/adrianco/spigo
What does @adrianco do?
@adrianco
Technology Due
Diligence on Deals
Presentations at
Conferences
Presentations at
Companies
Technical Advice
for Portfolio
Companies
Program
Committee for
Conferences
Networking with
Interesting PeopleTinkering with
Technologies
Maintain Deep
Relationship with
Cloud Vendors
| Battery Ventures
Portfolio Companies for Enterprise IT
Security
Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested.
Palo Alto Networks
Enterprise IT
Operations &
Management
Big DataCompute
Networking
Storage

More Related Content

What's hot (20)

PPTX
DockerCon 2017: Docker in China
Zhimin Tang
 
PPTX
Docker Federal Summit 2017 General Session
Docker, Inc.
 
PPTX
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Docker, Inc.
 
PDF
Driving Digital Transformation With Containers And Kubernetes Complete Deck
SlideTeam
 
PDF
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Docker, Inc.
 
PDF
The elements of kubernetes
Aaron Schlesinger
 
PPTX
Docker and Devops
Docker, Inc.
 
PPTX
2015 DockeCon monitoring presentation
Brian Christner
 
PPTX
Why cloud native matters
Cheryl Hung
 
PPTX
Achieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
Dean Delamont
 
PDF
Empower Your Docker Containers with Watson - DockerCon 2017 Austin
Phil Estes
 
PPTX
CNCF Introduction - Feb 2018
Krishna-Kumar
 
PDF
Making Friendly Microservices by Michele Titlol
Docker, Inc.
 
PDF
DCEU 18: From Monolith to Microservices
Docker, Inc.
 
PPTX
DockerCon 2017 - General Session Day 2 - Ben Golub
Docker, Inc.
 
PDF
What's New in Docker
Docker, Inc.
 
PDF
DockerCon 18 Cool Hacks: solo.io
Docker, Inc.
 
PDF
Cloud Native Development
Manuel Garcia
 
PDF
Clocker, Calico and Docker
Andrew Kennedy
 
PDF
DCSF19 Containerized Databases for Enterprise Applications
Docker, Inc.
 
DockerCon 2017: Docker in China
Zhimin Tang
 
Docker Federal Summit 2017 General Session
Docker, Inc.
 
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Docker, Inc.
 
Driving Digital Transformation With Containers And Kubernetes Complete Deck
SlideTeam
 
DCSF 19 How Entergy is Mitigating Legacy Windows Operating System Vulnerabili...
Docker, Inc.
 
The elements of kubernetes
Aaron Schlesinger
 
Docker and Devops
Docker, Inc.
 
2015 DockeCon monitoring presentation
Brian Christner
 
Why cloud native matters
Cheryl Hung
 
Achieving Cost and Resource Efficiency through Docker, OpenShift and Kubernetes
Dean Delamont
 
Empower Your Docker Containers with Watson - DockerCon 2017 Austin
Phil Estes
 
CNCF Introduction - Feb 2018
Krishna-Kumar
 
Making Friendly Microservices by Michele Titlol
Docker, Inc.
 
DCEU 18: From Monolith to Microservices
Docker, Inc.
 
DockerCon 2017 - General Session Day 2 - Ben Golub
Docker, Inc.
 
What's New in Docker
Docker, Inc.
 
DockerCon 18 Cool Hacks: solo.io
Docker, Inc.
 
Cloud Native Development
Manuel Garcia
 
Clocker, Calico and Docker
Andrew Kennedy
 
DCSF19 Containerized Databases for Enterprise Applications
Docker, Inc.
 

Viewers also liked (20)

PDF
Microservices Workshop All Topics Deck 2016
Adrian Cockcroft
 
PDF
Deep Learning through Examples
Sri Ambati
 
ZIP
UXSpeakeasy - How To Get A Great UX Job
Patrick Neeman
 
PDF
Schedule Review
Chris Carson
 
PPT
Service tax
has10nas
 
PDF
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Probir Bidhan
 
PDF
Web Trends to Watch in 2014
David King
 
PPTX
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
PPT
Intro to radiography 1_2(NDT)
Ravi Shekhar
 
PPT
Henry murray
Lo-Ann Placido
 
PPTX
Gene transfer technologies
Manoj Kumar Tekuri
 
PDF
New forever clean 9 booklet
Katalin Hidvegi
 
PDF
Engaging Learners with Technology
Dean Shareski
 
PPTX
SDH/SONET alarms & performance monitoring
MapYourTech
 
PPTX
Book review the alchemist
Rohit Patel
 
PDF
The Philippine Civil Service Commission
Jo Balucanag - Bitonio
 
PPT
Human Resource planning
Anything Group
 
PPT
Meningitis And Encephalitis
Narenthorn EMS Center
 
PDF
Learning c - An extensive guide to learn the C Language
Abhishek Dwivedi
 
PDF
La casbah d'Alger
Mebarka Fekih
 
Microservices Workshop All Topics Deck 2016
Adrian Cockcroft
 
Deep Learning through Examples
Sri Ambati
 
UXSpeakeasy - How To Get A Great UX Job
Patrick Neeman
 
Schedule Review
Chris Carson
 
Service tax
has10nas
 
Environmental Impact Assessment (EIA) report on Rampal 1320MW coal-based powe...
Probir Bidhan
 
Web Trends to Watch in 2014
David King
 
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Intro to radiography 1_2(NDT)
Ravi Shekhar
 
Henry murray
Lo-Ann Placido
 
Gene transfer technologies
Manoj Kumar Tekuri
 
New forever clean 9 booklet
Katalin Hidvegi
 
Engaging Learners with Technology
Dean Shareski
 
SDH/SONET alarms & performance monitoring
MapYourTech
 
Book review the alchemist
Rohit Patel
 
The Philippine Civil Service Commission
Jo Balucanag - Bitonio
 
Human Resource planning
Anything Group
 
Meningitis And Encephalitis
Narenthorn EMS Center
 
Learning c - An extensive guide to learn the C Language
Abhishek Dwivedi
 
La casbah d'Alger
Mebarka Fekih
 
Ad

Similar to Gluecon Monitoring Microservices and Containers: A Challenge (20)

PDF
Software Architecture Conference - Monitoring Microservices - A Challenge
Adrian Cockcroft
 
PDF
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
DataStax Academy
 
PDF
The Future of Cloud Innovation, featuring Adrian Cockcroft
Dun & Bradstreet Cloud Innovation Center
 
PDF
Kenzan: Architecting for Microservices
Darren Bathgate
 
PDF
What they don't tell you about micro-services
Daniel Rolnick
 
PPTX
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
PDF
Production-Ready_Microservices_excerpt.pdf
ajcob123
 
PPTX
Moving to microservices – a technology and organisation transformational journey
Boyan Dimitrov
 
PPTX
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Adrian Cockcroft
 
PPTX
Cloud anti-patterns
Mallika Iyer
 
PDF
#ATAGTR2020 Presentation - Microservices – Explored
Agile Testing Alliance
 
PDF
The journey to Native Cloud Architecture & Microservices, tracing the footste...
Mek Srunyu Stittri
 
PDF
Microservices: State of the Union
C4Media
 
PPTX
Cloud anti-patterns
Mallika Iyer
 
PDF
Surviving microservices
Francesco Degrassi
 
PDF
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
PDF
Microservices: moving parts around
Chris Winters
 
PDF
Modern Software Architecture - Cloud Scale Computing
Giragadurai Vallirajan
 
PDF
Docker microservices and the service mesh
Docker, Inc.
 
PPTX
QConSF-MicroServices-IPC-Netflix-Sudhir-2014.pptx
VimalKumar143058
 
Software Architecture Conference - Monitoring Microservices - A Challenge
Adrian Cockcroft
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
DataStax Academy
 
The Future of Cloud Innovation, featuring Adrian Cockcroft
Dun & Bradstreet Cloud Innovation Center
 
Kenzan: Architecting for Microservices
Darren Bathgate
 
What they don't tell you about micro-services
Daniel Rolnick
 
Monitoring Containerized Micro-Services In Azure
Alex Bulankou
 
Production-Ready_Microservices_excerpt.pdf
ajcob123
 
Moving to microservices – a technology and organisation transformational journey
Boyan Dimitrov
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Adrian Cockcroft
 
Cloud anti-patterns
Mallika Iyer
 
#ATAGTR2020 Presentation - Microservices – Explored
Agile Testing Alliance
 
The journey to Native Cloud Architecture & Microservices, tracing the footste...
Mek Srunyu Stittri
 
Microservices: State of the Union
C4Media
 
Cloud anti-patterns
Mallika Iyer
 
Surviving microservices
Francesco Degrassi
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
Microservices: moving parts around
Chris Winters
 
Modern Software Architecture - Cloud Scale Computing
Giragadurai Vallirajan
 
Docker microservices and the service mesh
Docker, Inc.
 
QConSF-MicroServices-IPC-Netflix-Sudhir-2014.pptx
VimalKumar143058
 
Ad

More from Adrian Cockcroft (20)

PDF
Gophercon 2016 Communicating Sequential Goroutines
Adrian Cockcroft
 
PDF
Monitoring Challenges - Monitorama 2016 - Monitoringless
Adrian Cockcroft
 
PDF
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Adrian Cockcroft
 
PDF
Microservices Workshop - Craft Conference
Adrian Cockcroft
 
PDF
Evolution of Microservices - Craft Conference
Adrian Cockcroft
 
PDF
Microservices: What's Missing - O'Reilly Software Architecture New York
Adrian Cockcroft
 
PDF
What's Missing? Microservices Meetup at Cisco
Adrian Cockcroft
 
PDF
In Search of Segmentation
Adrian Cockcroft
 
PDF
Microxchg Analyzing Response Time Distributions for Microservices
Adrian Cockcroft
 
PDF
Innovation and Architecture
Adrian Cockcroft
 
PDF
Cloud Trends Nov2015 Structure
Adrian Cockcroft
 
PDF
Openstack Silicon Valley - Vendor Lock In
Adrian Cockcroft
 
PDF
When Developers Operate and Operators Develop
Adrian Cockcroft
 
PDF
Dockercon 2015 - Faster Cheaper Safer
Adrian Cockcroft
 
PDF
Microservices the Good Bad and the Ugly
Adrian Cockcroft
 
PDF
Microxchg Microservices
Adrian Cockcroft
 
PDF
Cloud Native Cost Optimization UCC
Adrian Cockcroft
 
PDF
Dockercon State of the Art in Microservices
Adrian Cockcroft
 
PDF
Goto Berlin - Migrating to Microservices (Fast Delivery)
Adrian Cockcroft
 
PDF
Cloud Native Cost Optimization
Adrian Cockcroft
 
Gophercon 2016 Communicating Sequential Goroutines
Adrian Cockcroft
 
Monitoring Challenges - Monitorama 2016 - Monitoringless
Adrian Cockcroft
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Adrian Cockcroft
 
Microservices Workshop - Craft Conference
Adrian Cockcroft
 
Evolution of Microservices - Craft Conference
Adrian Cockcroft
 
Microservices: What's Missing - O'Reilly Software Architecture New York
Adrian Cockcroft
 
What's Missing? Microservices Meetup at Cisco
Adrian Cockcroft
 
In Search of Segmentation
Adrian Cockcroft
 
Microxchg Analyzing Response Time Distributions for Microservices
Adrian Cockcroft
 
Innovation and Architecture
Adrian Cockcroft
 
Cloud Trends Nov2015 Structure
Adrian Cockcroft
 
Openstack Silicon Valley - Vendor Lock In
Adrian Cockcroft
 
When Developers Operate and Operators Develop
Adrian Cockcroft
 
Dockercon 2015 - Faster Cheaper Safer
Adrian Cockcroft
 
Microservices the Good Bad and the Ugly
Adrian Cockcroft
 
Microxchg Microservices
Adrian Cockcroft
 
Cloud Native Cost Optimization UCC
Adrian Cockcroft
 
Dockercon State of the Art in Microservices
Adrian Cockcroft
 
Goto Berlin - Migrating to Microservices (Fast Delivery)
Adrian Cockcroft
 
Cloud Native Cost Optimization
Adrian Cockcroft
 

Recently uploaded (20)

PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

Gluecon Monitoring Microservices and Containers: A Challenge

  • 1. Monitoring Microservices & Containers: A Challenge Adrian Cockcroft @adrianco Technology Fellow - Battery Ventures May 2015
  • 2. Monitoring ! Update of my monitoring rules from Monitorama 2014
  • 3. Rule #1: Spend more time working on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics.
  • 4. Rule #2: Metric to display latency needs to be less than human attention span (~10s)
  • 5. Rule #3: Validate that your measurement system has enough accuracy and precision. Collect histograms of response time.
  • 6. Rule #4: Monitoring systems need to be more available and scalable than the systems being monitored.
  • 7. Rule #5: Optimize for distributed, ephemeral, cloud native, containerized microservices.
  • 8. Rule #6: Fit metrics to models to understand relationships. (New rule)
  • 10. Container Instance e.g. Machine failure affects all instances and containers inside itZone/DC Region Microservice Model Infrastructure as a Containment Hierarchy Machine Many tools use a naming scheme to imply this model, but most can’t reason about the relationships
  • 12. Request Model Applications and Networks as a Dataflow Graph APM Tools often model these as business transactions Microservice Zone/DC Region
  • 13. Developer Developer Model Deployment Ownership and Support Developer Developer
  • 14. Developer Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer
  • 15. Developer Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Monitoring Tools
  • 16. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Monitoring Tools
  • 17. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  • 18. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Manager Manager Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  • 19. DeveloperDeveloper Developer Model Deployment Ownership and Support Micro service Micro service Micro service Micro service Micro service Micro service Micro service Developer Developer Manager Manager VP Engineering Site Reliability Monitoring Tools Availability Metrics 99.95% customer success rate
  • 20. Infrastructure, flow and ownership models are orthogonal and need to be linked to make sense of the metrics
  • 21. Monitoring Rules by @adrianco 1. Spend more time on analysis than data collection and display 2. Reduce key business metric latency to less than 10s 3. Validate your measurement system, use histograms 4. Be more available and scalable than the services being monitored 5. Optimize for distributed, ephemeral cloud native applications 6. Fit metrics to models to understand relationships
  • 24. A Microservice Definition ! Loosely coupled service oriented architecture with bounded contexts
  • 25. A Microservice Definition ! Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled
  • 26. A Microservice Definition ! Loosely coupled service oriented architecture with bounded contexts If every service has to be updated at the same time it’s not loosely coupled If you have to know too much about surrounding services you don’t have a bounded context. See the Domain Driven Design book by Eric Evans.
  • 28. Monolithic apps have unlimited invisible internal dependencies ! Vastly more complex than explicit visible microservice dependencies
  • 29. Speed
  • 30. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years
  • 31. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks
  • 32. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours
  • 33. Speeding Up Deployments Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours AWS Lambda Events • Respond in milliseconds • Live for seconds
  • 34. Speeding Up Deployments Measuring CPU usage once a minute makes no sense for containers… Coping with rate of change is a big challenge for monitoring tools. Datacenter Snowflakes • Deploy in months • Live for years Virtualized and Cloud • Deploy in minutes • Live for weeks Container Deployments • Deploy in seconds • Live for minutes/hours AWS Lambda Events • Respond in milliseconds • Live for seconds
  • 35. Scale
  • 36. A Possible Hierarchy Continents Regions Zones Services Versions Containers Instances How Many? 3 to 5 2-4 per Continent 1-5 per Region 100’s per Zone Many per Service 1000’s per Version 10,000’s It’s much more challenging than just a large number of machines
  • 37. Flow
  • 38. Some tools can show the request flow across a few services
  • 39. But interesting architectures have a lot of microservices! Flow visualization is a challenge. See http://www.slideshare.net/LappleApple/gilt-from-monolith-ruby-app-to-micro-service-scala-service-architecture
  • 41. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones
  • 42. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones
  • 43. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones Zone partition/failure What should you do? What should monitors show?
  • 44. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones Zone partition/failure What should you do? What should monitors show? By design, everything works with 2 of 3 zones running. This is not an outage, inform but don’t touch anything! Halt deployments perhaps?
  • 45. ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Simple NetflixOSS style microservices architecture on three AWS Availability Zones Zone partition/failure What should you do? What should monitors show? By design, everything works with 2 of 3 zones running. This is not an outage, inform but don’t touch anything! Halt deployments perhaps? Challenge: understand and communicate common microservice failure patterns.
  • 47. Testing monitoring tools at scale gets expensive quickly…
  • 49. Simulated Microservices Model and visualize microservices Simulate interesting architectures Generate large scale configurations Eventually stress test real tools ! See github.com/adrianco/spigo Simulate Protocol Interactions in Go Visualize with D3 ELB Load Balancer Zuul API Proxy Karyon Business Logic Staash Data Access Layer Priam Cassandra Datastore Three Availability Zones
  • 50. netflixoss.go architecture !!!!!!!!!asgard.Create(cname, asgard.PriamCassandraPkg, regions, priamCassandracount, "eureka", cname) asgard.Create(tname, asgard.StaashPkg, regions, staashcount, cname) asgard.Create(jname, asgard.KaryonPkg, regions, javacount, tname) asgard.Create(nname, asgard.KaryonPkg, regions, nodecount, jname) asgard.Create(zuname, asgard.ZuulPkg, regions, zuulcount, nname) asgard.Create(elbname, asgard.ElbPkg, regions, 0, zuname) asgard.Run(asgard.Create(dns, asgard.DenominatorPkg, 0, 0, elbname), jname) // victimize a javaweb Tooling New tier name Tier package Region count: 1 Node count List of tier dependencies
  • 51. Run and log results to json $ spigo -a netflixoss -d 10 -j 2015/05/21 00:05:32 netflixoss: scaling to 100% 2015/05/21 00:05:32 netflixoss.edda: starting 2015/05/21 00:05:32 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: starting 2015/05/21 00:05:32 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: starting 2015/05/21 00:05:32 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: starting 2015/05/21 00:05:32 netflixoss.*.*.www.denominator.www0 activity rate 10ms 2015/05/21 00:05:37 chaosmonkey delete: netflixoss.us-east-1.zoneC.javaweb.karyon.javaweb14 2015/05/21 00:05:42 asgard: Shutdown 2015/05/21 00:05:42 netflixoss.us-east-1.zoneB.eureka.eureka.eureka1: closing 2015/05/21 00:05:42 netflixoss.us-east-1.zoneA.eureka.eureka.eureka0: closing 2015/05/21 00:05:42 netflixoss.us-east-1.zoneC.eureka.eureka.eureka2: closing 2015/05/21 00:05:42 spigo: complete 2015/05/21 00:05:42 netflixoss.edda: closing 10 sec run time edda.go logs config to json eureka.go service registry per zone Chaos monkey victim!
  • 52. Simianviz from json logs http://simianviz.divshot.io/netflixoss/1 ELB splits traffic over zones in single region microservices Cassandra Cluster Six regions Big thanks to @kurtiskemple
  • 53. Why Build Spigo? Generate test microservice configurations at scale Stress monitoring tools and simulated game day training ! Eventually (i.e. not implemented yet) Dynamically vary configuration: autoscale, code push Chaos gorilla for zone, region failures and partitions Websocket connection between spigo and simianviz display !
  • 54. My challenge to you: Build your architecture in Spigo. Stress monitoring tools with it. Help fix monitoring for microservices! ! @mgroeniger
  • 55. Questions? Disclosure: some of the companies mentioned may be Battery Ventures Portfolio Companies See www.battery.com for a list of portfolio investments ● Microservices Challenges ● Speed and Scale ● Flow and Failures ● Testing and Simulation ! ● Battery Ventures http://www.battery.com ● Adrian’s Tweets @adrianco and Blog http://perfcap.blogspot.com ● Slideshare http://slideshare.com/adriancockcroft ● Github http://github.com/adrianco/spigo
  • 56. What does @adrianco do? @adrianco Technology Due Diligence on Deals Presentations at Conferences Presentations at Companies Technical Advice for Portfolio Companies Program Committee for Conferences Networking with Interesting PeopleTinkering with Technologies Maintain Deep Relationship with Cloud Vendors
  • 57. | Battery Ventures Portfolio Companies for Enterprise IT Security Visit http://www.battery.com/our-companies/ for a full list of all portfolio companies in which all Battery Funds have invested. Palo Alto Networks Enterprise IT Operations & Management Big DataCompute Networking Storage