SlideShare a Scribd company logo
From #MonitoringSucks toFrom #MonitoringSucks to
#MonitoringLove#MonitoringLove
Open Source Monitoring in 2018-2019Open Source Monitoring in 2018-2019
@KrisBuytaert
Devops Meetup, Brno
Kris BuytaertKris Buytaert
● I used to be a Dev,I used to be a Dev,
● Then Became an OpThen Became an Op
● Chief Twitter Ofcer and Open SourceChief Twitter Ofcer and Open Source
Consultant @inuits.euConsultant @inuits.eu
● Everything is an efng DNS ProblemEverything is an efng DNS Problem
● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore
● Organising ConferencesOrganising Conferences
● Evangelizing devopsEvangelizing devops
An opinionated talk about the Open SourceAn opinionated talk about the Open Source
Monitoring tooling landscapeMonitoring tooling landscape
In which I hope to learn from YOUIn which I hope to learn from YOU
#devops=~C(L)AMS#devops=~C(L)AMS
● CultureCulture
● (Lean)(Lean)
● AutomationAutomation
● Monitoring and MeasurementMonitoring and Measurement
● SharingSharing
Damon Edwards and John WillisDamon Edwards and John Willis
Gene KimGene Kim
Monitoring is usually anMonitoring is usually an
aftertoughtaftertought
ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
AnAn 20082008 OLS PaperOLS Paper
● We have bloated Java toolsWe have bloated Java tools
● Some open Core stufSome open Core stuf
● DYI folks want traditional NagiosDYI folks want traditional Nagios
● DBA RequiredDBA Required
#monitoringsucks#monitoringsucks
● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011
● A sub #devops movementA sub #devops movement
● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
Why #monitoringsucksWhy #monitoringsucks
● Manual confg (gui)Manual confg (gui)
● Not in sync with realityNot in sync with reality
● Hosts onlyHosts only
● Services sometimesServices sometimes
● Application neverApplication never
● Chaos or out of sync with realityChaos or out of sync with reality
● Alert FatigueAlert Fatigue
#monitoringlove#monitoringlove
•
•
Ulf Mansson #devopsdays Rome 2011Ulf Mansson #devopsdays Rome 2011
•
A new era of toolingA new era of tooling
• #monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits
•
#monitorama#monitorama
What we wantWhat we want
● Small , well suited componentsSmall , well suited components
•
CollectCollect
•
Transport / MangleTransport / Mangle
•
StoreStore
•
AnalyseAnalyse
•
Act / AlertAct / Alert
•
VisualizeVisualize
Open Source Monitoring in 2019
Open Source Monitoring in 2019
The love was : SensuThe love was : Sensu
● Awesome for non staticAwesome for non static
environmentsenvironments
● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ?
● Looking more and more likeLooking more and more like
PrometheusPrometheus
● This is Europe, U no do cloudThis is Europe, U no do cloud
Automation ofAutomation of
#monitoring#monitoring
brought backbrought back
thethe #love#love
There is no such thingThere is no such thing
as “Service” discoveryas “Service” discovery
Monitoring aMonitoring a serviceservice
vsvs
Monitoring aMonitoring a ServiceService
AutomationAutomation
defnition of done:defnition of done:
monitored and in productionmonitored and in production
A software project is not doneA software project is not done
until your last end user is deaduntil your last end user is dead
Culture,Culture,
Automation,Automation,
Measurement :Measurement :
measure all the thingsmeasure all the things
SharingSharing
Collection :Collection :
● Collectd,Collectd,
● DiamondDiamond
● From applicationFrom application
● Custom ExportersCustom Exporters
● LogsLogs
● LogsLogs
NetDataNetData
● GranularityGranularity
● Debug vsDebug vs
constant prod ?constant prod ?
●
Transport / Ship / Mangle:Transport / Ship / Mangle:
● Collectd / Diamond / TelegrafCollectd / Diamond / Telegraf
● (r)syslog, Beats, logstash(r)syslog, Beats, logstash
● Q , Nats, ActiveMQ, RabbitMQQ , Nats, ActiveMQ, RabbitMQ
● Collect fromCollect from
anywhereanywhere
● FilterFilter
● Send anywhereSend anywhere
Store :Store :
● TSDB : Time Series DBTSDB : Time Series DB
● Optimized DB for Time SeriesOptimized DB for Time Series
● Graphite/ Infux / OpenTSDB / ....Graphite/ Infux / OpenTSDB / ....
● ElasticElastic
● Long Term vs Short Term StorageLong Term vs Short Term Storage
Oldschool graphiteOldschool graphite
PrometheusPrometheus
● Started 2012Started 2012
● SoundCloudSoundCloud
● Metrics BasedMetrics Based
● ScrapesScrapes
EndpointsEndpoints
•
ExistingExisting
endpoints forendpoints for
limited toolslimited tools
● GraphiteGraphite
ExporterExporter
● Push GatewayPush Gateway
● Great AlertingGreat Alerting
PrometheusPrometheus
● Mostly for Short TermMostly for Short Term
● Still Ship longterm metrics to otherStill Ship longterm metrics to other
TSDBTSDB
● Nginx gw’s all over the placeNginx gw’s all over the place
•
(ssl fun)(ssl fun)
Infnite Diskspace ?Infnite Diskspace ?
● Logstash outputLogstash output
•
Statsd => GraphiteStatsd => Graphite
•
Keep patterns around,Keep patterns around,
•
Selectively purge dataSelectively purge data
● Prometheus for Short TermPrometheus for Short Term
•
Graphite for Long termGraphite for Long term
Log AlternativesLog Alternatives
● Graylog2Graylog2
● ELSA (Enterprise Log Search andELSA (Enterprise Log Search and
Archive)Archive)
● ELK StackELK Stack
● FluentdFluentd
Prometheus ?Prometheus ?
● Only For Containers ?Only For Containers ?
● Also for other setups !Also for other setups !
● Is this sufcient ?Is this sufcient ?
Act / Alert:Act / Alert:
Checking for FailureChecking for Failure
● IcingaIcinga
•
Automated confg generationAutomated confg generation
● SensuSensu
•
CloudstyleCloudstyle
● PrometheusPrometheus
•
AlertManagerAlertManager
Waking you up at nightWaking you up at night
● FlapjackFlapjack
fapjack.iofapjack.io
monitoring notifcation routing +monitoring notifcation routing +
event processing systemevent processing system
● OpenDutyOpenDuty
github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty
Duty managementDuty management
Waking you up at nightWaking you up at night
● AnagAnag
● Custom written stufCustom written stuf
Analyse:Analyse:
Basic SearchBasic Search
Graphs to KnowledgeGraphs to Knowledge
SkylineSkyline
•
OculusOculus
•
Creating Information out of this dataCreating Information out of this data
•
Big dataBig data
•
Machine LearningMachine Learning
•
Hastic.ioHastic.io
Hastic.ioHastic.io
● Open Source Pattern DetectionOpen Source Pattern Detection
● Label patterns → Wait for learning toLabel patterns → Wait for learning to
complete → Get detectionscomplete → Get detections
● Hastic Server + Grafana AppHastic Server + Grafana App
LogIslandsLogIslands
● Complex event processing &Complex event processing &
patterns mining at scalepatterns mining at scale
● Kafka, Nif, Spark, HadoopKafka, Nif, Spark, Hadoop
Visualize:Visualize:
KibanaKibana
GrafanaGrafana
ChallengeChallenge
● *ana as code*ana as code
● Template your ...Template your ...
● e.g grafonnet-libe.g grafonnet-lib
•
A jsonnet lib to generate GrafanaA jsonnet lib to generate Grafana
dashboards ...dashboards ...
AggregatingAggregating
● ThrukThruk
● GrafanaGrafana
● DashingDashing
Lack of change ?Lack of change ?
● Limited # new toolsLimited # new tools
● Feature Complete ?Feature Complete ?
Is prometheus the newIs prometheus the new
Docker ?Docker ?
APMAPM
Application Performance MonitoringApplication Performance Monitoring
But what about my apps ?But what about my apps ?
● agent required that ties to codeagent required that ties to code
● Code modifcationsCode modifcations
Old PacketBeatOld PacketBeat
Open Source “APM”Open Source “APM”
● Scouter
● Jaeger
● Kamon
● Zipkin
● Beats ...
● Performance Co Pilot
● Kamon
● Pinpoint
● Micrometer
● StageMonitor
● SkyWalking
● Kieker
=> Huge focus on the Java Ecosystem , little
options for PHP/ Python / Ruby shops.
OpenAPM.ioOpenAPM.io
OpenTracing 101OpenTracing 101
● The problem : It was not reasonable to ask all OSS services and all OSSThe problem : It was not reasonable to ask all OSS services and all OSS
packages and all application-specifc code to use a single tracingpackages and all application-specifc code to use a single tracing
vendor => Open Ttracingvendor => Open Ttracing
● Distributed Tracing StandardDistributed Tracing Standard
● CNCFCNCF
● Dapper inside GoogleDapper inside Google
● ““OpenTracing is not a download or a program. Distributed tracingOpenTracing is not a download or a program. Distributed tracing
requires that software developers add instrumentation to the code ofrequires that software developers add instrumentation to the code of
an application, or to the frameworks used in the application”an application, or to the frameworks used in the application”
Complexity is the EnemyComplexity is the Enemy
of Reliabilityof Reliability
I love where Monitoring is headingI love where Monitoring is heading
““Wait , was I oncall last week ?”Wait , was I oncall last week ?”
True words said by one of our oncall engineersTrue words said by one of our oncall engineers
OpservabilityOpservability
ContactContact
Kris Buytaert kris.buytaert@inuits.euKris Buytaert kris.buytaert@inuits.eu
Further ReadingFurther Reading
@krisbuytaert@krisbuytaert
http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/
http://www.inuits.eu/http://www.inuits.eu/
Find Inuits inFind Inuits in
Brasschaat,Ghent,Brasschaat,Ghent,
Rotterdam,Prague,Rotterdam,Prague,
Kiev,BrnoKiev,Brno

More Related Content

What's hot (20)

PDF
Can we fix dev-oops ?
Kris Buytaert
 
ODP
From MonitoringSucks to Monitoring Love , 2016 Edition
Kris Buytaert
 
PDF
Pipeline as Code
Kris Buytaert
 
ODP
Is there a future for devops ?
Kris Buytaert
 
PDF
Devops is Dead, Long live Devops
Kris Buytaert
 
PDF
Devops is a Security Requirement
Kris Buytaert
 
ODP
Nightmare on Docker street
Kris Buytaert
 
PDF
10 years of #devopsdays, but what have we really learned ?
Kris Buytaert
 
PDF
Pipeline as code for your infrastructure as Code
Kris Buytaert
 
PDF
Moby is killing your devops efforts
Kris Buytaert
 
PDF
Dev secops opsec, devsec, devops ?
Kris Buytaert
 
PDF
Devops is dead, Long Live Devops
Kris Buytaert
 
ODP
Automating MySQL operations with Puppet
Kris Buytaert
 
ODP
From devoops to devops
Kris Buytaert
 
PDF
Groovy there's a docker in my application pipeline
Kris Buytaert
 
ODP
Devopsdays Amsterdam 2017 Keynote, looking back at 5 years of AMS
Kris Buytaert
 
PDF
The Return of the Dull Stack Engineer
Kris Buytaert
 
PDF
ADDO 2019: Looking back at over 10 years of Devops
Kris Buytaert
 
PDF
No, we can't do continuous delivery
Kris Buytaert
 
ODP
On the Importance of Infrastructure as Code
Kris Buytaert
 
Can we fix dev-oops ?
Kris Buytaert
 
From MonitoringSucks to Monitoring Love , 2016 Edition
Kris Buytaert
 
Pipeline as Code
Kris Buytaert
 
Is there a future for devops ?
Kris Buytaert
 
Devops is Dead, Long live Devops
Kris Buytaert
 
Devops is a Security Requirement
Kris Buytaert
 
Nightmare on Docker street
Kris Buytaert
 
10 years of #devopsdays, but what have we really learned ?
Kris Buytaert
 
Pipeline as code for your infrastructure as Code
Kris Buytaert
 
Moby is killing your devops efforts
Kris Buytaert
 
Dev secops opsec, devsec, devops ?
Kris Buytaert
 
Devops is dead, Long Live Devops
Kris Buytaert
 
Automating MySQL operations with Puppet
Kris Buytaert
 
From devoops to devops
Kris Buytaert
 
Groovy there's a docker in my application pipeline
Kris Buytaert
 
Devopsdays Amsterdam 2017 Keynote, looking back at 5 years of AMS
Kris Buytaert
 
The Return of the Dull Stack Engineer
Kris Buytaert
 
ADDO 2019: Looking back at over 10 years of Devops
Kris Buytaert
 
No, we can't do continuous delivery
Kris Buytaert
 
On the Importance of Infrastructure as Code
Kris Buytaert
 

Similar to Open Source Monitoring in 2019 (20)

PDF
Open Source Monitoring in 2015
Kris Buytaert
 
PDF
Monitoring in the cloud with Puppet
Kris Buytaert
 
PDF
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
NETWAYS
 
PDF
OSMC 2014 | From monitoringsucks to monitoringlove, and back by Kris Buytaert
NETWAYS
 
PDF
Monitoring Drupal In an Infrastructure as Code Age
Kris Buytaert
 
ODP
Monitoring in an Infrastructure as Code Age
Puppet
 
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
PDF
The Open-Source Monitoring Landscape
VictorOps
 
PDF
The Open-Source Monitoring Landscape
Mike Merideth
 
PDF
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
NETWAYS
 
PDF
Monitoring in an Infrastructure as Code Age
Kris Buytaert
 
PDF
OSMC 2024 | Bow for me for I am Coroot by Kris Buytaert.pdf
NETWAYS
 
ODP
Monitoring with ElasticSearch
Kris Buytaert
 
ODP
Monitoring your VM's at Scale
Kris Buytaert
 
PPTX
Time to say goodbye to your Nagios based setup
Check my Website
 
PDF
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
NETWAYS
 
PDF
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
PDF
Monitoring - deeper dive
Robert Kubiś
 
PDF
Microservices and Prometheus (Microservices NYC 2016)
Brian Brazil
 
KEY
London devops logging
Tomas Doran
 
Open Source Monitoring in 2015
Kris Buytaert
 
Monitoring in the cloud with Puppet
Kris Buytaert
 
OSMC 2014: From monitoringsucks to monitoringlove (and back) | Kris Buytaert
NETWAYS
 
OSMC 2014 | From monitoringsucks to monitoringlove, and back by Kris Buytaert
NETWAYS
 
Monitoring Drupal In an Infrastructure as Code Age
Kris Buytaert
 
Monitoring in an Infrastructure as Code Age
Puppet
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
The Open-Source Monitoring Landscape
VictorOps
 
The Open-Source Monitoring Landscape
Mike Merideth
 
OSMC 2014 | Time to say goodbye to your Nagios based setup? by Oliver Jan
NETWAYS
 
Monitoring in an Infrastructure as Code Age
Kris Buytaert
 
OSMC 2024 | Bow for me for I am Coroot by Kris Buytaert.pdf
NETWAYS
 
Monitoring with ElasticSearch
Kris Buytaert
 
Monitoring your VM's at Scale
Kris Buytaert
 
Time to say goodbye to your Nagios based setup
Check my Website
 
OSMC 2014: Time to say goodbye to your Nagios setup | Oliver Jan
NETWAYS
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Monitoring - deeper dive
Robert Kubiś
 
Microservices and Prometheus (Microservices NYC 2016)
Brian Brazil
 
London devops logging
Tomas Doran
 
Ad

More from Kris Buytaert (10)

PDF
Years of (not) learning , from devops to devoops
Kris Buytaert
 
PDF
Observability will not fix your Broken Monitoring ,Ignite
Kris Buytaert
 
PDF
Infrastructure as Code Patterns
Kris Buytaert
 
PDF
From devoops to devops 13 years of (not) learning
Kris Buytaert
 
PDF
Pipeline all the Dashboards as Code
Kris Buytaert
 
PDF
Is there a Future for devops ?
Kris Buytaert
 
PDF
10 Years of #devopsdays weirdness
Kris Buytaert
 
PDF
Continuous Infrastructure First Ignite Edition
Kris Buytaert
 
ODP
Looking back at 5 years of #cfgmgmtcamp
Kris Buytaert
 
ODP
Looking back at 7.5 years of Devopsdays , DOd PDX
Kris Buytaert
 
Years of (not) learning , from devops to devoops
Kris Buytaert
 
Observability will not fix your Broken Monitoring ,Ignite
Kris Buytaert
 
Infrastructure as Code Patterns
Kris Buytaert
 
From devoops to devops 13 years of (not) learning
Kris Buytaert
 
Pipeline all the Dashboards as Code
Kris Buytaert
 
Is there a Future for devops ?
Kris Buytaert
 
10 Years of #devopsdays weirdness
Kris Buytaert
 
Continuous Infrastructure First Ignite Edition
Kris Buytaert
 
Looking back at 5 years of #cfgmgmtcamp
Kris Buytaert
 
Looking back at 7.5 years of Devopsdays , DOd PDX
Kris Buytaert
 
Ad

Recently uploaded (20)

PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
July Patch Tuesday
Ivanti
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
July Patch Tuesday
Ivanti
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 

Open Source Monitoring in 2019

  • 1. From #MonitoringSucks toFrom #MonitoringSucks to #MonitoringLove#MonitoringLove Open Source Monitoring in 2018-2019Open Source Monitoring in 2018-2019 @KrisBuytaert Devops Meetup, Brno
  • 2. Kris BuytaertKris Buytaert ● I used to be a Dev,I used to be a Dev, ● Then Became an OpThen Became an Op ● Chief Twitter Ofcer and Open SourceChief Twitter Ofcer and Open Source Consultant @inuits.euConsultant @inuits.eu ● Everything is an efng DNS ProblemEverything is an efng DNS Problem ● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore ● Organising ConferencesOrganising Conferences ● Evangelizing devopsEvangelizing devops
  • 3. An opinionated talk about the Open SourceAn opinionated talk about the Open Source Monitoring tooling landscapeMonitoring tooling landscape In which I hope to learn from YOUIn which I hope to learn from YOU
  • 4. #devops=~C(L)AMS#devops=~C(L)AMS ● CultureCulture ● (Lean)(Lean) ● AutomationAutomation ● Monitoring and MeasurementMonitoring and Measurement ● SharingSharing Damon Edwards and John WillisDamon Edwards and John Willis Gene KimGene Kim
  • 5. Monitoring is usually anMonitoring is usually an aftertoughtaftertought ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
  • 6. AnAn 20082008 OLS PaperOLS Paper ● We have bloated Java toolsWe have bloated Java tools ● Some open Core stufSome open Core stuf ● DYI folks want traditional NagiosDYI folks want traditional Nagios ● DBA RequiredDBA Required
  • 7. #monitoringsucks#monitoringsucks ● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011 ● A sub #devops movementA sub #devops movement ● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
  • 8. Why #monitoringsucksWhy #monitoringsucks ● Manual confg (gui)Manual confg (gui) ● Not in sync with realityNot in sync with reality ● Hosts onlyHosts only ● Services sometimesServices sometimes ● Application neverApplication never ● Chaos or out of sync with realityChaos or out of sync with reality ● Alert FatigueAlert Fatigue
  • 9. #monitoringlove#monitoringlove • • Ulf Mansson #devopsdays Rome 2011Ulf Mansson #devopsdays Rome 2011 • A new era of toolingA new era of tooling • #monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits • #monitorama#monitorama
  • 10. What we wantWhat we want ● Small , well suited componentsSmall , well suited components • CollectCollect • Transport / MangleTransport / Mangle • StoreStore • AnalyseAnalyse • Act / AlertAct / Alert • VisualizeVisualize
  • 13. The love was : SensuThe love was : Sensu ● Awesome for non staticAwesome for non static environmentsenvironments ● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ? ● Looking more and more likeLooking more and more like PrometheusPrometheus ● This is Europe, U no do cloudThis is Europe, U no do cloud
  • 14. Automation ofAutomation of #monitoring#monitoring brought backbrought back thethe #love#love
  • 15. There is no such thingThere is no such thing as “Service” discoveryas “Service” discovery
  • 16. Monitoring aMonitoring a serviceservice vsvs Monitoring aMonitoring a ServiceService
  • 18. defnition of done:defnition of done: monitored and in productionmonitored and in production
  • 19. A software project is not doneA software project is not done until your last end user is deaduntil your last end user is dead
  • 20. Culture,Culture, Automation,Automation, Measurement :Measurement : measure all the thingsmeasure all the things SharingSharing
  • 21. Collection :Collection : ● Collectd,Collectd, ● DiamondDiamond ● From applicationFrom application ● Custom ExportersCustom Exporters ● LogsLogs ● LogsLogs
  • 22. NetDataNetData ● GranularityGranularity ● Debug vsDebug vs constant prod ?constant prod ? ●
  • 23. Transport / Ship / Mangle:Transport / Ship / Mangle: ● Collectd / Diamond / TelegrafCollectd / Diamond / Telegraf ● (r)syslog, Beats, logstash(r)syslog, Beats, logstash ● Q , Nats, ActiveMQ, RabbitMQQ , Nats, ActiveMQ, RabbitMQ
  • 24. ● Collect fromCollect from anywhereanywhere ● FilterFilter ● Send anywhereSend anywhere
  • 25. Store :Store : ● TSDB : Time Series DBTSDB : Time Series DB ● Optimized DB for Time SeriesOptimized DB for Time Series ● Graphite/ Infux / OpenTSDB / ....Graphite/ Infux / OpenTSDB / .... ● ElasticElastic ● Long Term vs Short Term StorageLong Term vs Short Term Storage
  • 27. PrometheusPrometheus ● Started 2012Started 2012 ● SoundCloudSoundCloud ● Metrics BasedMetrics Based ● ScrapesScrapes EndpointsEndpoints • ExistingExisting endpoints forendpoints for limited toolslimited tools ● GraphiteGraphite ExporterExporter ● Push GatewayPush Gateway ● Great AlertingGreat Alerting
  • 28. PrometheusPrometheus ● Mostly for Short TermMostly for Short Term ● Still Ship longterm metrics to otherStill Ship longterm metrics to other TSDBTSDB ● Nginx gw’s all over the placeNginx gw’s all over the place • (ssl fun)(ssl fun)
  • 29. Infnite Diskspace ?Infnite Diskspace ? ● Logstash outputLogstash output • Statsd => GraphiteStatsd => Graphite • Keep patterns around,Keep patterns around, • Selectively purge dataSelectively purge data ● Prometheus for Short TermPrometheus for Short Term • Graphite for Long termGraphite for Long term
  • 30. Log AlternativesLog Alternatives ● Graylog2Graylog2 ● ELSA (Enterprise Log Search andELSA (Enterprise Log Search and Archive)Archive) ● ELK StackELK Stack ● FluentdFluentd
  • 31. Prometheus ?Prometheus ? ● Only For Containers ?Only For Containers ? ● Also for other setups !Also for other setups ! ● Is this sufcient ?Is this sufcient ?
  • 32. Act / Alert:Act / Alert:
  • 33. Checking for FailureChecking for Failure ● IcingaIcinga • Automated confg generationAutomated confg generation ● SensuSensu • CloudstyleCloudstyle ● PrometheusPrometheus • AlertManagerAlertManager
  • 34. Waking you up at nightWaking you up at night ● FlapjackFlapjack fapjack.iofapjack.io monitoring notifcation routing +monitoring notifcation routing + event processing systemevent processing system ● OpenDutyOpenDuty github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty Duty managementDuty management
  • 35. Waking you up at nightWaking you up at night ● AnagAnag ● Custom written stufCustom written stuf
  • 38. Graphs to KnowledgeGraphs to Knowledge SkylineSkyline • OculusOculus • Creating Information out of this dataCreating Information out of this data • Big dataBig data • Machine LearningMachine Learning • Hastic.ioHastic.io
  • 39. Hastic.ioHastic.io ● Open Source Pattern DetectionOpen Source Pattern Detection ● Label patterns → Wait for learning toLabel patterns → Wait for learning to complete → Get detectionscomplete → Get detections ● Hastic Server + Grafana AppHastic Server + Grafana App
  • 40. LogIslandsLogIslands ● Complex event processing &Complex event processing & patterns mining at scalepatterns mining at scale ● Kafka, Nif, Spark, HadoopKafka, Nif, Spark, Hadoop
  • 44. ChallengeChallenge ● *ana as code*ana as code ● Template your ...Template your ... ● e.g grafonnet-libe.g grafonnet-lib • A jsonnet lib to generate GrafanaA jsonnet lib to generate Grafana dashboards ...dashboards ...
  • 46. Lack of change ?Lack of change ? ● Limited # new toolsLimited # new tools ● Feature Complete ?Feature Complete ?
  • 47. Is prometheus the newIs prometheus the new Docker ?Docker ?
  • 48. APMAPM Application Performance MonitoringApplication Performance Monitoring But what about my apps ?But what about my apps ? ● agent required that ties to codeagent required that ties to code ● Code modifcationsCode modifcations
  • 50. Open Source “APM”Open Source “APM” ● Scouter ● Jaeger ● Kamon ● Zipkin ● Beats ... ● Performance Co Pilot ● Kamon ● Pinpoint ● Micrometer ● StageMonitor ● SkyWalking ● Kieker => Huge focus on the Java Ecosystem , little options for PHP/ Python / Ruby shops.
  • 52. OpenTracing 101OpenTracing 101 ● The problem : It was not reasonable to ask all OSS services and all OSSThe problem : It was not reasonable to ask all OSS services and all OSS packages and all application-specifc code to use a single tracingpackages and all application-specifc code to use a single tracing vendor => Open Ttracingvendor => Open Ttracing ● Distributed Tracing StandardDistributed Tracing Standard ● CNCFCNCF ● Dapper inside GoogleDapper inside Google ● ““OpenTracing is not a download or a program. Distributed tracingOpenTracing is not a download or a program. Distributed tracing requires that software developers add instrumentation to the code ofrequires that software developers add instrumentation to the code of an application, or to the frameworks used in the application”an application, or to the frameworks used in the application”
  • 53. Complexity is the EnemyComplexity is the Enemy of Reliabilityof Reliability
  • 54. I love where Monitoring is headingI love where Monitoring is heading ““Wait , was I oncall last week ?”Wait , was I oncall last week ?” True words said by one of our oncall engineersTrue words said by one of our oncall engineers
  • 56. ContactContact Kris Buytaert [email protected] Buytaert [email protected] Further ReadingFurther Reading @krisbuytaert@krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/ http://www.inuits.eu/http://www.inuits.eu/ Find Inuits inFind Inuits in Brasschaat,Ghent,Brasschaat,Ghent, Rotterdam,Prague,Rotterdam,Prague, Kiev,BrnoKiev,Brno