SlideShare a Scribd company logo
Infrastructure & System
Monitoring
using Prometheus
Marco Pas
Philips Lighting
Software geek, hands on
Developer/Architect/DevOps Engineer
@marcopas
Some stuff about me...
● Mostly doing cloud related stuff
○ Java, Groovy, Scala, Spring Boot, IOT, AWS, Terraform, Infrastructure
● Enjoying the good things
● Chef leuke dingen doen == “trying out cool and new stuff”
● Currently involved in a big IOT project
● Wannabe chef, movie & Netflix addict
Agenda
● Monitoring
○ Introducing you to a Scary Movie
● Prometheus overview (demo’s)
○ Running Prometheus
○ Gathering host metrics
○ Introducing Grafana
○ Monitoring Docker containers
○ Alerting
○ Instrumenting your own code
○ Service Discovery (Consul) integration
..Quick Inventory..
I am going to introduce
you to some bad movies
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
Commonality
between
these movies?
Monitoring
Infrastructure & System Monitoring using Prometheus
Our scary movie “The Happy Developer”
● Lets push out features
● I can demo so it works :)
● It works with 1 user, so it will work with
multiple
● Don’t worry about performance we will
just scale using multiple
machines/processes
● Logging is into place
Did
anyone
notice?
Disaster Strikes
Logging
“recording to diagnose a system”
Monitoring
“observation, checking and recording”
http_requests_total{method="post",code="200"} 1027 1395066363000
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Logging != Monitoring
Vital Signs
Why Monitoring?
● Know when things go wrong
○ Detection & Alerting
● Be able to debug and gain insight
● Detect changes over time and
drive technical/business decisions
● Feed into other systems/processes
(e.g. security, automation)
What to monitor?
IT Network
Operating
System
Services
Applications
Capture
Monitoring
Information
Functional
Monitoring
Operational
Monitoring
metric data
Houston we have Storage problem!
Storage
metric data
metric data
metric data
metric data
metric data
metric data
metric data
metric data
metric data
How to store the mass amount of
metrics and also making them easy
to query?
Time Series - Database
● Time series data is a sequence of data points collected at regular intervals
over a period of time. (metrics)
○ Examples:
■ Device data
■ Weather data
■ Stock prices
■ Tide measurements
■ Solar flare tracking
● The data requires aggregation and analysis
Time Series
Database
metric data
● High write performance
● Data compaction
● Fast, easy range queries
metric name and a set of key-value pairs, also known as labels
<metric name>{<label name>=<label value>, ...} value [ timestamp ]
http_requests_total{method="post",code="200"} 1027 1395066363000
Time Series - Data format
Source:
http://db-engines.com/en/ranking/time+series+dbmshttp://db-engines.com/en/ranking/time+series+dbms
Prometheus Overview
Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit originally
built at SoundCloud. It is now a standalone open source project and maintained
independently of any company.
https://prometheus.io
Implemented using
Prometheus Components
● The main Prometheus server which scrapes and stores time series data
● Client libraries for instrumenting application code
● A push gateway for supporting short-lived jobs
● Special-purpose exporters (for HAProxy, StatsD, Graphite, etc.)
● An alertmanager
● Various support tools
● WhiteBox Monitoring instead of probing [aka BlackBox Monitoring]
Prometheus Overview
List of Job Exporters
● Prometheus managed:
○ JMX
○ Node
○ Graphite
○ Blackbox
○ SNMP
○ HAProxy
○ Consul
○ Memcached
○ AWS Cloudwatch
○ InfluxDB
○ StatsD
○ ...
● Custom ones:
○ Database
○ Hardware related
○ Messaging systems
○ Storage
○ HTTP
○ APIs
○ Logging
○ …
https://prometheus.io/docs/instrumenting/exporters/
Demo Structure
Demo: Run Prometheus (native)
# file: prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# some settings intentionally removed!!
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Code Demo
“Running Prometheus Native”
Demo: Run Prometheus using Docker
34
# file: docker-compose.yml
version: '2'
services:
prometheus:
image: prom/prometheus:latest → Using official prometheus container
volumes:
- $PWD:/etc/prometheus → Mount local directory used for config + data
ports:
- "9090:9090" → Port mapping used for this container host:container
command:
- "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration
Code Demo
“Running Prometheus Dockerized”
Demo: Add host metrics
# file: docker-compose.yml
version: '2'
services:
prometheus: → Runnning prometheus as Docker container
image: prom/prometheus:latest → Using official prometheus container
volumes:
- $PWD:/etc/prometheus → Mount local directory used for config + data
ports:
- "9090:9090" → Port mapping used for this container host:container
command:
- "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration
node-exporter:
image: prom/node-exporter:latest → Using node exporter as an additional container
ports:
- '9100:9100' → Port mapping used for this container host:container
38
# file: prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# some settings intentionally removed!!
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
Code Demo
“Add host metrics”
Demo: Grafana
40
# file: docker-compose.yml
version: '2'
services:
# some code intentionally removed!!
grafana:
image: grafana/grafana:latest → Using official prometheus container
ports:
- "3000:3000" → Port mapping used for this container host:container
You get the idea :)
Code Demo
“Grafana”
Demo: Monitor Docker containers
Code Demo
“cAdvisor”
Demo: Alerting
Alerting Configuration
● Alert Rules
○ What are the settings where we
need to alert upon?
● Alert Manager
○ Where do we need to send the alert
to?
# file: alert.rules
ALERT serviceDownAlert
IF absent(((time() - container_last_seen{name="<service_name>"}) < 5))
FOR 5s
LABELS {
severity = "critical", → setting the labels so we can use them in the AlertManager
service = "backend"
}
ANNOTATIONS { → information used in the alert event
SUMMARY = "Container Instance down",
DESCRIPTION = "Container Instance is down for more than 15 sec."
}
# file: alert-manager.yml
global: → Global settings
smtp_smarthost: 'mailslurper:2500'
smtp_from: 'alertmanager@example.org'
smtp_require_tls: false
route: → Routing
receiver: mail # Fallback → Fallback is there is no match
routes:
- match:
severity: critical → Match on label!
continue: true → Continue with other receivers if there is a match
receiver: mail → Determine the receiver
- match:
severity: critical
receiver: slack
# file: alert-manager.yml (continued)
receivers:
- name: mail → mail receiver
email_configs:
- to: 'team-X+alerts@example.org'
- name: slack → slack receiver
slack_configs:
- send_resolved: true
username: 'AlertManager'
channel: '#alert'
api_url: 'THIS IS A VERY SECRET URL :)’
# file: prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "alert.rules"
# some settings intentionally removed!!
Code Demo
“Alerting -> The Alert Manager”
Instrumenting your own code!
● Counter
○ A cumulative metric that represents a single numerical value that only ever goes up
● Gauge
○ Single numerical value that can arbitrarily go up and down
● Histogram
○ Samples observations (usually things like request durations or response sizes) and counts
them in configurable buckets. It also provides a sum of all observed values
● Summary
○ Histogram + total count of observations + sum of all observed values, it calculates
configurable quantiles over a sliding time window
Available Languages
● Official
○ Go, Java or Scala, Python, Ruby
● Unofficial
○ Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for Nginx, Lua for Tarantool, .NET / C#,
Node.js, PHP, Rust
// Spring Boot example -> file: build.gradle
dependencies {
compile('org.springframework.boot:spring-boot-starter-web')
testCompile('org.springframework.boot:spring-boot-starter-test')
compile('io.prometheus:simpleclient_spring_boot:0.0.21') → Add dependency
}
Prometheus Client Libaries: SpringBoot Example
@EnablePrometheusEndpoint
@EnableSpringBootMetricsCollector
@RestController
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); }
static final Counter requests = Counter.build() → create metric type counter
.name("helloworld_requests_total") → set metric name
.help("HelloWorld Total requests.").register(); → register the metric
@RequestMapping("/helloworld")
String home() {
requests.inc(); → increment the counter with 1 (helloworld_requests_total)
return "Hello World!";
}
}
Demo: Application metrics
Code Demo
“Application metrics”
Service Discovery
(Consul) Integration
Demo: Consul Integration
Service Discovery
Demo: Consul integration
Register the services with
Consul and Monitor
1
2
Code Demo
“Consul to the rescue”
Infrastructure & System Monitoring using Prometheus
That’s a wrap!
Question?
https://github.com/mpas/infrastructure-and-system-monitoring-using-prometheus
Marco Pas
Philips Lighting
Software geek, hands on
Developer/Architect/DevOps Engineer
@marcopas

More Related Content

What's hot (20)

PPTX
An Introduction to Prometheus (GrafanaCon 2016)
Brian Brazil
 
PPTX
OpenTelemetry For Architects
Kevin Brockhoff
 
PPTX
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Sridhar Kumar N
 
PDF
End to-end monitoring with the prometheus operator - Max Inden
Paris Container Day
 
PDF
Prometheus + Grafana = Awesome Monitoring
Henrique Galafassi Dalssaso
 
PPTX
Prometheus design and philosophy
Docker, Inc.
 
PDF
Prometheus
wyukawa
 
PPTX
Monitoring With Prometheus
Agile Testing Alliance
 
PDF
Prometheus - basics
Juraj Hantak
 
PPTX
Grafana.pptx
Bhushan Rane
 
PPTX
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
PDF
How to monitor your micro-service with Prometheus?
Wojciech Barczyński
 
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
PDF
Monitoring Kubernetes with Prometheus
Grafana Labs
 
PPTX
OpenTelemetry For Operators
Kevin Brockhoff
 
PDF
Prometheus Overview
Brian Brazil
 
PDF
Monitoring Kubernetes with Prometheus
Grafana Labs
 
PDF
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
PDF
Observability
Diego Pacheco
 
PDF
Explore your prometheus data in grafana - Promcon 2018
Grafana Labs
 
An Introduction to Prometheus (GrafanaCon 2016)
Brian Brazil
 
OpenTelemetry For Architects
Kevin Brockhoff
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Sridhar Kumar N
 
End to-end monitoring with the prometheus operator - Max Inden
Paris Container Day
 
Prometheus + Grafana = Awesome Monitoring
Henrique Galafassi Dalssaso
 
Prometheus design and philosophy
Docker, Inc.
 
Prometheus
wyukawa
 
Monitoring With Prometheus
Agile Testing Alliance
 
Prometheus - basics
Juraj Hantak
 
Grafana.pptx
Bhushan Rane
 
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
How to monitor your micro-service with Prometheus?
Wojciech Barczyński
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
Monitoring Kubernetes with Prometheus
Grafana Labs
 
OpenTelemetry For Operators
Kevin Brockhoff
 
Prometheus Overview
Brian Brazil
 
Monitoring Kubernetes with Prometheus
Grafana Labs
 
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
Observability
Diego Pacheco
 
Explore your prometheus data in grafana - Promcon 2018
Grafana Labs
 

Similar to Infrastructure & System Monitoring using Prometheus (20)

PDF
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga
 
PDF
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Codemotion
 
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Brian Brazil
 
PDF
Prometheus Course from beginners to expert course
anil490062
 
PDF
Prometheus course
Jorn Jambers
 
PPTX
Prometheus Training
Tim Tyler
 
PDF
From nothing to Prometheus : one year after
Antoine Leroyer
 
PDF
Monitoring with Prometheus
Richard Langlois P. Eng.
 
PPTX
Prometheus (Monitorama 2016)
Brian Brazil
 
PDF
Microservices and Prometheus (Microservices NYC 2016)
Brian Brazil
 
PPTX
Prometheus workshop
OpsTree solutions
 
PPTX
Prometheus for Monitoring Metrics (Fermilab 2018)
Brian Brazil
 
PDF
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
PDF
System monitoring
HardikBadola
 
PDF
Prometheus (Microsoft, 2016)
Brian Brazil
 
PDF
Monitoring Cloud Native Applications with Prometheus
Jacopo Nardiello
 
PDF
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Brian Brazil
 
PPTX
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Brian Brazil
 
PDF
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
PDF
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Codemotion
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Brian Brazil
 
Prometheus Course from beginners to expert course
anil490062
 
Prometheus course
Jorn Jambers
 
Prometheus Training
Tim Tyler
 
From nothing to Prometheus : one year after
Antoine Leroyer
 
Monitoring with Prometheus
Richard Langlois P. Eng.
 
Prometheus (Monitorama 2016)
Brian Brazil
 
Microservices and Prometheus (Microservices NYC 2016)
Brian Brazil
 
Prometheus workshop
OpsTree solutions
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Brian Brazil
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
System monitoring
HardikBadola
 
Prometheus (Microsoft, 2016)
Brian Brazil
 
Monitoring Cloud Native Applications with Prometheus
Jacopo Nardiello
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Brian Brazil
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Brian Brazil
 
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
Ad

Recently uploaded (20)

PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Ad

Infrastructure & System Monitoring using Prometheus

  • 1. Infrastructure & System Monitoring using Prometheus Marco Pas Philips Lighting Software geek, hands on Developer/Architect/DevOps Engineer @marcopas
  • 2. Some stuff about me... ● Mostly doing cloud related stuff ○ Java, Groovy, Scala, Spring Boot, IOT, AWS, Terraform, Infrastructure ● Enjoying the good things ● Chef leuke dingen doen == “trying out cool and new stuff” ● Currently involved in a big IOT project ● Wannabe chef, movie & Netflix addict
  • 3. Agenda ● Monitoring ○ Introducing you to a Scary Movie ● Prometheus overview (demo’s) ○ Running Prometheus ○ Gathering host metrics ○ Introducing Grafana ○ Monitoring Docker containers ○ Alerting ○ Instrumenting your own code ○ Service Discovery (Consul) integration
  • 5. I am going to introduce you to some bad movies
  • 14. Our scary movie “The Happy Developer” ● Lets push out features ● I can demo so it works :) ● It works with 1 user, so it will work with multiple ● Don’t worry about performance we will just scale using multiple machines/processes ● Logging is into place
  • 16. Logging “recording to diagnose a system” Monitoring “observation, checking and recording” http_requests_total{method="post",code="200"} 1027 1395066363000 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Logging != Monitoring
  • 18. Why Monitoring? ● Know when things go wrong ○ Detection & Alerting ● Be able to debug and gain insight ● Detect changes over time and drive technical/business decisions ● Feed into other systems/processes (e.g. security, automation)
  • 19. What to monitor? IT Network Operating System Services Applications Capture Monitoring Information Functional Monitoring Operational Monitoring metric data
  • 20. Houston we have Storage problem! Storage metric data metric data metric data metric data metric data metric data metric data metric data metric data How to store the mass amount of metrics and also making them easy to query?
  • 21. Time Series - Database ● Time series data is a sequence of data points collected at regular intervals over a period of time. (metrics) ○ Examples: ■ Device data ■ Weather data ■ Stock prices ■ Tide measurements ■ Solar flare tracking ● The data requires aggregation and analysis Time Series Database metric data ● High write performance ● Data compaction ● Fast, easy range queries
  • 22. metric name and a set of key-value pairs, also known as labels <metric name>{<label name>=<label value>, ...} value [ timestamp ] http_requests_total{method="post",code="200"} 1027 1395066363000 Time Series - Data format
  • 25. Prometheus Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is now a standalone open source project and maintained independently of any company. https://prometheus.io Implemented using
  • 26. Prometheus Components ● The main Prometheus server which scrapes and stores time series data ● Client libraries for instrumenting application code ● A push gateway for supporting short-lived jobs ● Special-purpose exporters (for HAProxy, StatsD, Graphite, etc.) ● An alertmanager ● Various support tools ● WhiteBox Monitoring instead of probing [aka BlackBox Monitoring]
  • 28. List of Job Exporters ● Prometheus managed: ○ JMX ○ Node ○ Graphite ○ Blackbox ○ SNMP ○ HAProxy ○ Consul ○ Memcached ○ AWS Cloudwatch ○ InfluxDB ○ StatsD ○ ... ● Custom ones: ○ Database ○ Hardware related ○ Messaging systems ○ Storage ○ HTTP ○ APIs ○ Logging ○ … https://prometheus.io/docs/instrumenting/exporters/
  • 31. # file: prometheus.yml global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. # some settings intentionally removed!! # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
  • 33. Demo: Run Prometheus using Docker
  • 34. 34 # file: docker-compose.yml version: '2' services: prometheus: image: prom/prometheus:latest → Using official prometheus container volumes: - $PWD:/etc/prometheus → Mount local directory used for config + data ports: - "9090:9090" → Port mapping used for this container host:container command: - "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration
  • 36. Demo: Add host metrics
  • 37. # file: docker-compose.yml version: '2' services: prometheus: → Runnning prometheus as Docker container image: prom/prometheus:latest → Using official prometheus container volumes: - $PWD:/etc/prometheus → Mount local directory used for config + data ports: - "9090:9090" → Port mapping used for this container host:container command: - "-config.file=/etc/prometheus/prometheus.yml" → Prometheus configuration node-exporter: image: prom/node-exporter:latest → Using node exporter as an additional container ports: - '9100:9100' → Port mapping used for this container host:container
  • 38. 38 # file: prometheus.yml global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. # some settings intentionally removed!! # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node-exporter' static_configs: - targets: ['node-exporter:9100']
  • 39. Code Demo “Add host metrics”
  • 41. # file: docker-compose.yml version: '2' services: # some code intentionally removed!! grafana: image: grafana/grafana:latest → Using official prometheus container ports: - "3000:3000" → Port mapping used for this container host:container You get the idea :)
  • 43. Demo: Monitor Docker containers
  • 46. Alerting Configuration ● Alert Rules ○ What are the settings where we need to alert upon? ● Alert Manager ○ Where do we need to send the alert to?
  • 47. # file: alert.rules ALERT serviceDownAlert IF absent(((time() - container_last_seen{name="<service_name>"}) < 5)) FOR 5s LABELS { severity = "critical", → setting the labels so we can use them in the AlertManager service = "backend" } ANNOTATIONS { → information used in the alert event SUMMARY = "Container Instance down", DESCRIPTION = "Container Instance is down for more than 15 sec." }
  • 48. # file: alert-manager.yml global: → Global settings smtp_smarthost: 'mailslurper:2500' smtp_from: '[email protected]' smtp_require_tls: false route: → Routing receiver: mail # Fallback → Fallback is there is no match routes: - match: severity: critical → Match on label! continue: true → Continue with other receivers if there is a match receiver: mail → Determine the receiver - match: severity: critical receiver: slack
  • 49. # file: alert-manager.yml (continued) receivers: - name: mail → mail receiver email_configs: - to: '[email protected]' - name: slack → slack receiver slack_configs: - send_resolved: true username: 'AlertManager' channel: '#alert' api_url: 'THIS IS A VERY SECRET URL :)’
  • 50. # file: prometheus.yml global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "alert.rules" # some settings intentionally removed!!
  • 51. Code Demo “Alerting -> The Alert Manager”
  • 52. Instrumenting your own code! ● Counter ○ A cumulative metric that represents a single numerical value that only ever goes up ● Gauge ○ Single numerical value that can arbitrarily go up and down ● Histogram ○ Samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values ● Summary ○ Histogram + total count of observations + sum of all observed values, it calculates configurable quantiles over a sliding time window
  • 53. Available Languages ● Official ○ Go, Java or Scala, Python, Ruby ● Unofficial ○ Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for Nginx, Lua for Tarantool, .NET / C#, Node.js, PHP, Rust // Spring Boot example -> file: build.gradle dependencies { compile('org.springframework.boot:spring-boot-starter-web') testCompile('org.springframework.boot:spring-boot-starter-test') compile('io.prometheus:simpleclient_spring_boot:0.0.21') → Add dependency }
  • 54. Prometheus Client Libaries: SpringBoot Example @EnablePrometheusEndpoint @EnableSpringBootMetricsCollector @RestController @SpringBootApplication public class DemoApplication { public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); } static final Counter requests = Counter.build() → create metric type counter .name("helloworld_requests_total") → set metric name .help("HelloWorld Total requests.").register(); → register the metric @RequestMapping("/helloworld") String home() { requests.inc(); → increment the counter with 1 (helloworld_requests_total) return "Hello World!"; } }
  • 60. Demo: Consul integration Register the services with Consul and Monitor 1 2
  • 61. Code Demo “Consul to the rescue”