SlideShare a Scribd company logo
8
Most read
9
Most read
22
Most read
Eric Lippmann | Icinga Camp Milan | Oct 17, 2023
Monitor Kubernetes with
Icinga
(how it could be)
Eric, CTO @ Icinga
Traditional
Monitoring
Icinga – Traditional Monitoring
• Hosts
• Bare metal, virtual machines
• Cloud instances to some extent
• Services
• Resource usage
• Applications, …
• Check Plugins
• Alerts
Icinga – Traditional Monitoring
• Automation
• Configuration Management
• Director
• Icinga APIs
• Metrics
Monitoring K8s
Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023
Monitoring K8s – What to Monitor
• Hosts (where K8s components run)
• K8s itself
• Services, e.g. Deployments, *Sets, Jobs
• Pods
• Containers
• Key metics
 Not only infrastructure but also workloads
Challenges – Complexity
• Loads of resource types
• Multiple components and layers
• Different failure points
• Understanding of the entire stack
 Via hosts, services and check plugins?
Challenges – Ephemeral
Run Fail Respawn Run
Challenges – Pods Come and Go
Challenges – Metrics
K8s Monitoring
Cluster
Nodes
Applications
Pods
Containers
Health
Metrics
Resource
usage
Expec-
tations
Events
K8s Monitoring – Probes
Liveness probes periodically check container liveness and
restart containers that fail it.
Readiness probes indicate container readiness and remove
failing ones from their service endpoints.
Startup probes defer the execution of liveness and readiness
probes and restarts containers that fail it.
K8s Monitoring – Approaches
• Poll K8s APIs
• Agent per node via DaemonSet
• Agent per pod (sidecar container)
• Events
• Metrics
• Logs
• APM
Possible K8s Metric Sources
• Node metrics from Prometheus node exporter
• Container metrics from cAdvisor (or metrics-server)
• K8s metrics
• API server
• etcd
• scheduler
• controller manager
• kube-state-metrics
Icinga K8s
Monitoring
Icinga K8s Monitoring, at the moment…
• Collects K8s resources and their
• health, events, certain metrics and logs
• Visualizes K8s resources and hierarchies
Icinga K8s Monitoring, should also…
• Correlate health, logs, metrics and events
• Provide alerts
• Of course, via icinga-notifications
• Give configuration tips
Icinga K8s Monitoring Architecture
• Icinga Web Module (PHP)
• View resources and hierarchies
• Daemon (Go)
• Collect resources, health, events,
logs and certain metrics
• Send alerts
• Database (PostgreSQL / MySQL / MariaDB)
• Stores resources, health, …
Icinga K8s Monitoring Architecture
Icinga K8s Monitoring Ideas
• Account node failures
• Number of nodes remaining referenced to the load
• CPU, memory and storage
• Compare requests, limits and actual utilization
• Indicate overcommitment of nodes
• Monitor DNS, K8s probes, latencies, traffic, …
• Affinities and anti-affinities
twitter.com/icinga github.com/icinga facebook.com/icinga
icinga.com
Thank You!
What are your questions?

More Related Content

What's hot (11)

ODP
Unit testing with Qt test
Davide Coppola
 
PDF
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
PDF
Advanced Git
Sergiu-Ioan Ungur
 
PPT
Lucece Indexing
Prasenjit Mukherjee
 
PDF
What is Continuous Integration? | Continuous Integration with Jenkins | DevOp...
Edureka!
 
PDF
CNCF and Cloud Native Intro
Cloud Native Bangalore
 
PDF
Behavior Driven Development and Automation Testing Using Cucumber
KMS Technology
 
PDF
Apache Spark vs Apache Flink
AKASH SIHAG
 
PPT
Google test training
Thierry Gayet
 
PPTX
Apache Arrow Flight Overview
Jacques Nadeau
 
Unit testing with Qt test
Davide Coppola
 
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
Advanced Git
Sergiu-Ioan Ungur
 
Lucece Indexing
Prasenjit Mukherjee
 
What is Continuous Integration? | Continuous Integration with Jenkins | DevOp...
Edureka!
 
CNCF and Cloud Native Intro
Cloud Native Bangalore
 
Behavior Driven Development and Automation Testing Using Cucumber
KMS Technology
 
Apache Spark vs Apache Flink
AKASH SIHAG
 
Google test training
Thierry Gayet
 
Apache Arrow Flight Overview
Jacques Nadeau
 

Similar to Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023 (20)

PDF
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
NETWAYS
 
PDF
Monitoring on Kubernetes using Prometheus - Chandresh
CodeOps Technologies LLP
 
PPTX
Monitoring on Kubernetes using prometheus
Chandresh Pancholi
 
PDF
OSMC 2023 | Current State of Icinga by Bernd Erk
NETWAYS
 
PDF
Monitoring Kubernetes with Prometheus
Grafana Labs
 
PDF
Monitoring kubernetes across data center and cloud
Datadog
 
PDF
Monitoring Kubernetes with Prometheus
Grafana Labs
 
PPTX
Centralizing Kubernetes and Container Operations
Kublr
 
PPTX
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
InfluxData
 
PDF
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Bob Cotton
 
PDF
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston
 
PDF
Introduction to kubernetes
Gabriel Carro
 
PPTX
Introduction+to+Kubernetes-Details-D.pptx
SantoshPandey160
 
PDF
Presentación11.pdf
PabloCanesta
 
PDF
Prometheus kubernetes tech talk
Chandresh Pancholi
 
PDF
OSMC 2024 | Current State of Icinga by Bernd Erk.pdf
NETWAYS
 
PPTX
Kubernetes Immersion
Juan Larriba
 
PPTX
Working with kubernetes
Nagaraj Shenoy
 
PDF
Nex clipper 1905_summary_eng
Jinyong Kim
 
PDF
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Fabian Reinartz
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
NETWAYS
 
Monitoring on Kubernetes using Prometheus - Chandresh
CodeOps Technologies LLP
 
Monitoring on Kubernetes using prometheus
Chandresh Pancholi
 
OSMC 2023 | Current State of Icinga by Bernd Erk
NETWAYS
 
Monitoring Kubernetes with Prometheus
Grafana Labs
 
Monitoring kubernetes across data center and cloud
Datadog
 
Monitoring Kubernetes with Prometheus
Grafana Labs
 
Centralizing Kubernetes and Container Operations
Kublr
 
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
InfluxData
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Bob Cotton
 
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston
 
Introduction to kubernetes
Gabriel Carro
 
Introduction+to+Kubernetes-Details-D.pptx
SantoshPandey160
 
Presentación11.pdf
PabloCanesta
 
Prometheus kubernetes tech talk
Chandresh Pancholi
 
OSMC 2024 | Current State of Icinga by Bernd Erk.pdf
NETWAYS
 
Kubernetes Immersion
Juan Larriba
 
Working with kubernetes
Nagaraj Shenoy
 
Nex clipper 1905_summary_eng
Jinyong Kim
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Fabian Reinartz
 
Ad

More from Icinga (20)

PDF
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023
Icinga
 
PDF
Extending Icinga Web with Modules: powerful, smart and easily created - Icing...
Icinga
 
PDF
Infrastructure Monitoring for Cloud Native Enterprises - Icinga Camp Milan 2023
Icinga
 
PDF
Incident management: Best industry practices your team should know - Icinga C...
Icinga
 
PDF
Monitoring Cooling Units in a pharmaceutical GxP regulated environment - Icin...
Icinga
 
PDF
SNMP Monitoring at scale - Icinga Camp Milan 2023
Icinga
 
PPTX
Current State of Icinga - Icinga Camp Milan 2023
Icinga
 
PDF
Efficient IT operations using monitoring systems and standardized tools - Ici...
Icinga
 
PPTX
Tornado Complex Event Processing Framework for Icinga - Icinga Camp Zurich 2019
Icinga
 
PDF
Signalilo: Visualizing Prometheus alerts in Icinga2 - Icinga Camp Zurich 2019
Icinga
 
PDF
Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019
Icinga
 
PDF
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
Icinga
 
PDF
Current State of Icinga - Icinga Camp Zurich 2019
Icinga
 
PDF
NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019
Icinga
 
PDF
Integrating Icinga 2 and ntopng - Icinga Camp Milan 2019
Icinga
 
PDF
DevOps monitoring: Best Practices using OpenShift combined with Icinga & Big ...
Icinga
 
PPTX
Current State of Icinga - Icinga Camp Milan 2019
Icinga
 
PPTX
Best of Icinga Modules - Icinga Camp Milan 2019
Icinga
 
PPTX
hallenges of Monitoring Big Infrastructure - Icinga Camp Milan 2019
Icinga
 
PPTX
Discover the real user experience with Alyvix - Icinga Camp Milan 2019
Icinga
 
Upgrading Incident Management with Icinga - Icinga Camp Milan 2023
Icinga
 
Extending Icinga Web with Modules: powerful, smart and easily created - Icing...
Icinga
 
Infrastructure Monitoring for Cloud Native Enterprises - Icinga Camp Milan 2023
Icinga
 
Incident management: Best industry practices your team should know - Icinga C...
Icinga
 
Monitoring Cooling Units in a pharmaceutical GxP regulated environment - Icin...
Icinga
 
SNMP Monitoring at scale - Icinga Camp Milan 2023
Icinga
 
Current State of Icinga - Icinga Camp Milan 2023
Icinga
 
Efficient IT operations using monitoring systems and standardized tools - Ici...
Icinga
 
Tornado Complex Event Processing Framework for Icinga - Icinga Camp Zurich 2019
Icinga
 
Signalilo: Visualizing Prometheus alerts in Icinga2 - Icinga Camp Zurich 2019
Icinga
 
Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019
Icinga
 
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
Icinga
 
Current State of Icinga - Icinga Camp Zurich 2019
Icinga
 
NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019
Icinga
 
Integrating Icinga 2 and ntopng - Icinga Camp Milan 2019
Icinga
 
DevOps monitoring: Best Practices using OpenShift combined with Icinga & Big ...
Icinga
 
Current State of Icinga - Icinga Camp Milan 2019
Icinga
 
Best of Icinga Modules - Icinga Camp Milan 2019
Icinga
 
hallenges of Monitoring Big Infrastructure - Icinga Camp Milan 2019
Icinga
 
Discover the real user experience with Alyvix - Icinga Camp Milan 2019
Icinga
 
Ad

Recently uploaded (20)

PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PDF
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
PDF
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PDF
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
PDF
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Activate_Methodology_Summary presentatio
annapureddyn
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 

Monitoring Kubernetes with Icinga - Icinga Camp Milan 2023

Editor's Notes

  • #5: Hosts Rather static Ping checks Services Resource usaga CPU, Memory, Storage. Network, Latencies Apps Webserver Databases URLs Check Plugins Contain logic Common understanding of what is wrong Not each and everyone has to find and configure own rules Cube Business Process vSphere
  • #9: Hosts K8s Nodes K8s itself Etcd, scheduler, controller, api server Services aka. K8s resources Cluster Monitoring (infrastructure) All clusters should monitor the underlying server components since problems at the server level will show up in the workloads. Some metrics to look for while monitoring node resources are CPU, disk, and network bandwidth. Having an overview of these metrics will let you know if it’s time to scale the cluster up or down (this is especially useful when using cloud providers where running cost is important). Workload Monitoring (workload) Metrics related to deployments and their pods should be taken into consideration here. Checking the number of pods a deployment has at a moment compared to its desired state can be relevant. Also, we can look for health checks, container metrics, and finally application metrics.
  • #11: Everything is gone Logs. Metrics, events
  • #12: Jobs Configuration changes Scaling Name changes (not for StatefulSet) History Collect everything but alert on service level
  • #13: In order to determine the health at every level, from the application to the operating system to the infrastructure, you need to monitor metrics in all the different layers and components - services, containers, pods, deployments, nodes, and clusters. And each and everyone has to understand which metrics there are, what they mean and how to interpret them. In this scenario, monitoring the cluster metrics would show roughly 50% memory utilization. It’s not very useful information, nor is it alarming. But what would happen if you go down a level and monitor the metrics of each node? In that case, one of the nodes would show 100% memory usage — this would reveal a problem, but not its origin. Going down another level to the pod metrics would get you closer to the problem, and going down yet another level to the container metrics would allow you to isolate the culprit of the memory leak. This simple example shows the value of monitoring the metrics of each Kubernetes layer. Yes, cluster-wide metrics provide a high-level overview of Kubernetes deployment performance, but you’ll need those lower-layer metrics to identify problems and obtain useful insights that will help you administer the cluster and optimize the resources.
  • #14: Cluster Kubernetes components Resource usage Underutilized / Over capacitv Nodes Number of nodes sufficient? Account node failures Capacity of Pods, Ips and ressources Pods Resource usage against requests and limits Running vs desired Containers Logs Metrics Cluster, Pods, Containers, Deployments, Sets, Applications Expectations Number of replicas Deployment Updated pods