SlideShare a Scribd company logo
Sumo Logic confidential
Kubernetes Monitoring &
Best Practices
1
Sumo Logic confidential
• Principal Development Engineer at DellEMC
• 1st half of my career was in CGI & VMware
• 2nd half of my career has been in System Integration Testing
• Docker Captain (since 2016)
• Docker Bangalore Meetup Organizer ( 8800+ Registered
Users)
• DockerLabs Incubator ~ 1700+ Slack Members
• Freqeunt Blogger – www.collabnix.com
Ajeet Singh Raina
Twitter: @ajeetsraina
GitHub: ajeetraina
2
Sumo Logic confidential
Suresh Govindachetty
• Enterprise Sales Engineer at Sumo Logic
• Formerly with Citrix, HPE,Nortel
• Mostly in Presales, Networking and Security
3
Sumo Logic confidential
Massive shift in
monitoring
requirements from
host based
monitoring
to
“container-specific
& service-oriented
monitoring”
4
Sumo Logic confidential
Containers & Kubernetes: The New Reality
App
Traditional
Software
Architecture
Containerized
Architecture
Server
Orchestrated
Containerized
Architecture
5
Sumo Logic confidential
Traditional Monitoring Solution
Bare Metal
System
hypervisor
Virtual Machines Containers
Monitoring agent
6
In a Monolithic World…
What to Monitor?
Application
Hosts on which the
applications gets deployed
7
In a Cloud Native World…
What to Monitor?
Hosts
Kubernetes
Platform
Docker Containers Containerized
Microservices
8
Sumo Logic confidential
Benefits of Containers & Kubernetes
Portability Scalability Rolling Updates Service Discovery Load Balancing
Self Healing Secure
9
Sumo Logic confidential
While Kubernetes solves old problems,
it introduces new ones.
10
Sumo Logic confidential
K8s is powerful…
but Complex !
Kubernetes
is great but
COMPLEX!
$kubectl create –f web.yaml
Current Challenges in Kubernetes Monitoring & Troubleshooting
Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
Everything,
In K8s
by design
Is
Ephemeral
Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
Cascading
Failures
- Container Communication
- Increased Dependencies
- Changing Architecture
Sumo Logic confidential
Current Challenges in Kubernetes Monitoring & Troubleshooting
K8s is powerful…
but Complex !
More & Noisy
Metrics(100x)
- Container Unique Metrics
- Ephemeral Data
- False Positives
Sumo Logic confidential
Methodology Switch
Cattle: (Container) Pet: (K8s Services)
o Named with strings of numbers
o Almost identical
o Ephemeral
o Sick: get new one
o 1 or more identical Pods
o Specific Name( kube_app, kube_name)
o Give context to container metrics
o Sick: nurse back to health
15
Sumo Logic confidential
Visualizing Kubernetes Objects
Service A
Namespace
Service B
Container
Pod C1
Pod C2
Pod C3
Service C
Container
Container
Pet
Cattle
16
Sumo Logic confidential
K8s Monitoring Strategies & Methods
- Remote Polling( K8s metric/event APIs)
- Node-based (agent per host/ DaemonSets)
- Sidecars (agent per Pod)
- Logs & APM
17
Sumo Logic confidential
K8s Metrics - Monitoring Kubernetes Cluster
Node resource utilization The number of nodes Running pods
- Are number of nodes available
sufficient?
- Can they handle the entire
workload in case a node fails?
- Number of nodes available
- What you are paying for
- Discover what the cluster is
being used for.
- Network bandwidth
- Disk utilization
- CPU, and
- Memory
18
Sumo Logic confidential
K8s Metrics - Monitoring Pod
Kubernetes Metrics Container Metrics Application Metrics
- Developed by the application
itself and are related to the
business rules it addresses.
- For example, a database
application exposing metrics
related to an indices’ state and
statistics concerning tables and
relationships.
- Using Cadvisor and exposed by
Heapster, which queries every
node about the running
containers.
- Metrics like CPU, network, and
memory usage compared with
the maximum allowed are the
highlights.
- Monitor how a specific pod and its
deployment are being handled
- The number of instances a pod has
at the moment and how many were
expected
- How the on-progress deployment is
going (how many instances were
changed from an older version to a
new one), health checks, and some
network data available through
network services.
19
Sumo Logic confidential
Node Metrics from node_exporter Container Metrics from cadvisor K8s Metrics from K8s API Server
- node_exporter installed a DaemonSet
- 1 instance per node
- Also called as “K8s Core Metrics”
- Metrics about the performance of the k8s
API server
- Standard Host Metrics
- Load Average
- CPU
- Memory
- Disk
- Network
- Embedded into the Kubelet, so we
scrape the Kubelet to get container
metrics
- For each container on the node:
- CPU Usage
- Filesystem read/write/limits
- Memory usage and limits
- Network transmit/receive/dropped
- Performance of controller work queues
- Request Rates and Latencies
- ETCD helper cache work queues and
cache performance
- General process status(File
Descriptors/Memory/CPU seconds.
- GoLang Status(GC/Memory/Threads).
100 unique series in typical node
Sources of Metrics in Kubernetes
20
Sumo Logic confidential
Source of Metrics in Kubernetes
k8s derived kube-state-metrics Etcd Metrics from etcd
- Counts & metadata about many k8s types
- Count of many 'nouns'
- Resource limits
- Container States
- Ready/restarts/running/terminated/waiting
- Etcd is "master of all truth" within a k8s
cluster
- Leader existence and leader change
rate
- Disk Write Performance
- Inbound gRPC stats
- etcd_http_received_total
- etcd_http_failed_total
- etcd_http_successful_duration_*
21
Kubernetes Monitoring
Best Practices
22
Sumo Logic confidential
#1: Collect Metrics at Container Level but Alerts at Service
Level
$cat /etc/docker/daemon.json
{
"metrics-addr" : "127.0.0.1:9323",
"experimental" : true
}
Sumo Logic confidential
#2: Monitor Service Level Objective(SLO) per Service per Route
• Error Rate per Service per route
• Latency per Service per route
Sumo Logic confidential
#3: Infra Metrics: Utilization
- Resource Availability for Pods Vs Allocation
- Verify every Pod/Container has a limit (BP)
25
Sumo Logic confidential
#4: Always alert on High Disk Usage
26
• Monitor ALL disk volumes, including the root file system.
• Kubernetes Node Exporter provides a nice metric for tracking devices
Sumo Logic confidential
#5: Never ignore Kube-system
27
• Total DNS Requests - Resource Issue, Scaling Limits, Application Bug
• DNS Request Time - High Latency
• Quorum Loss in the cluster/Failure in Leader Election
• Unusual High Snapshot Duration
• Network criticality
Sumo Logic confidential
#6: Consistent Metadata Enrichment
Tag individual components of Kubernetes so that it can provide context for
your services
Sumo Logic confidential
Best Practice #6: No Better KPI than API - Track the API
Gateway for Microservices in order to
automatically detect application issues
<Image TBD>
29
Sumo Logic confidential
Discoverability - Infrastructure vs. Service View
- Complex
- Slow to find and troubleshoot issues
- Disconnected from the customer reality
- Simple to understand
- Quick to find and troubleshoot issues
- Tightly connected to the customer reality
Service-centric ViewpointInfrastructure-centric Viewpoint
30
Sumologic K8s Monitoring and Troubleshooting
• Delivers a best in class, end-to-end Kubernetes Monitoring and Troubleshooting experience.
• Open source collectors (Fluentbit, Fluentd,Prometheus, Falco)
• Visualize K8s hierarchies through Deployment, Service, Node and Namespace views
• Honeycomb visualization - quick overview of data in a visually digestible way.
• Simplified Monitoring and Troubleshooting
• Correlation of Logs, Metrics, event and Security
• Integrated security with Falco+ partner apps
Sumo Logic confidential
Data Collection with Sumo Logic
32
Sumo Logic Confidential
Our Kubernetes Partner Apps - Security
App Purpose Details
SecOps Provides comprehensive monitoring and analysis solution for detecting
vulnerabilities and potential threats throughout your environment,
including hosts, containers, images and registry.
SecOps Helps you detect, investigate and remediate vulnerabilities, insecure
configurations and compliance violations across all container and
Kubernetes environments.
SecOps Provides granular security and compliance control monitoring to
DevSecOps teams throughout the cloud native application lifecycle, from
development to runtime in production.
SecOps Gives customers the ability to detect, investigate, and remediate
vulnerabilities in software artifacts across your deployment environments.
33
Sumo Logic Confidential
Ecosystem - Unified K8s DevOps and SecOps
Monitoring
CI/CD DevOps SecOps
circleci
codefresh
armory
harness
Kubernetes
AmazonEKS
Google
Kubernetes
Service
Azure
Kubernetes
Service
Falco
Twistlock
StackRox
aqua
Tigera
JFrog Xray
34
Sumo Logic confidential
It’s Demo Time…
35
Sumo Logic Confidential
36
References
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-
monitoring/
https://www.sumologic.com/lp/kubernetes-monitoring-app
37
Sumo Logic Confidential
Thank You
38

More Related Content

What's hot (20)

PDF
Kubernetes - A Comprehensive Overview
Bob Killen
 
PDF
Kubernetes a comprehensive overview
Gabriel Carro
 
PDF
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Edureka!
 
PDF
Gitlab, GitOps & ArgoCD
Haggai Philip Zagury
 
PDF
CD using ArgoCD(KnolX).pdf
Knoldus Inc.
 
PPTX
Kubernetes PPT.pptx
ssuser0cc9131
 
PDF
Google Kubernetes Engine (GKE) deep dive
Akash Agrawal
 
PDF
The journey to GitOps
Nicola Baldi
 
PDF
Gitops Hands On
Brice Fernandes
 
PDF
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
SlideTeam
 
PPTX
Kubernetes Security
Karthik Gaekwad
 
PDF
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
PDF
An overview of the Kubernetes architecture
Igor Sfiligoi
 
PDF
Kubernetes: A Short Introduction (2019)
Megan O'Keefe
 
PPTX
Kubernetes for Beginners: An Introductory Guide
Bytemark
 
PDF
Kubernetes Architecture and Introduction
Stefan Schimanski
 
PPTX
Amazon EKS Deep Dive
Andrzej Komarnicki
 
ODP
Openshift Container Platform
DLT Solutions
 
PPTX
Kubernetes Introduction
Martin Danielsson
 
PDF
Kubernetes Webinar - Using ConfigMaps & Secrets
Janakiram MSV
 
Kubernetes - A Comprehensive Overview
Bob Killen
 
Kubernetes a comprehensive overview
Gabriel Carro
 
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Edureka!
 
Gitlab, GitOps & ArgoCD
Haggai Philip Zagury
 
CD using ArgoCD(KnolX).pdf
Knoldus Inc.
 
Kubernetes PPT.pptx
ssuser0cc9131
 
Google Kubernetes Engine (GKE) deep dive
Akash Agrawal
 
The journey to GitOps
Nicola Baldi
 
Gitops Hands On
Brice Fernandes
 
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
SlideTeam
 
Kubernetes Security
Karthik Gaekwad
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
An overview of the Kubernetes architecture
Igor Sfiligoi
 
Kubernetes: A Short Introduction (2019)
Megan O'Keefe
 
Kubernetes for Beginners: An Introductory Guide
Bytemark
 
Kubernetes Architecture and Introduction
Stefan Schimanski
 
Amazon EKS Deep Dive
Andrzej Komarnicki
 
Openshift Container Platform
DLT Solutions
 
Kubernetes Introduction
Martin Danielsson
 
Kubernetes Webinar - Using ConfigMaps & Secrets
Janakiram MSV
 

Similar to Kubernetes Monitoring & Best Practices (20)

PDF
JDO 2019: What you should be aware of before setting up kubernetes on premise...
PROIDEA
 
PDF
Operational Visibiliy and Analytics - BU Seminar
Canturk Isci
 
PPTX
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic
 
PDF
Implementing Observability for Kubernetes.pdf
Jose Manuel Ortega Candel
 
PPTX
Fabio rapposelli pks-vmug
VMUG IT
 
PPTX
Centralizing Kubernetes and Container Operations
Kublr
 
PDF
The ultimate Kubernetes Deployment Checklist - Infra to Microservices
Prakarsh -
 
PDF
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston
 
PDF
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 
PDF
Kubernetes Administration from Zero to Hero.pdf
ArzooGupta16
 
PDF
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
NETWAYS
 
PDF
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Bob Cotton
 
PDF
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
QAware GmbH
 
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
PDF
How to Monitor Microservices
Sysdig
 
PDF
Load Balancing in the Cloud using Nginx & Kubernetes
Lee Calcote
 
PDF
Opentelemetry - From frontend to backend
Sebastian Poxhofer
 
PPTX
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic
 
PDF
Monitoring Cockpit for OpenShift Clusters
ConSol Consulting & Solutions Software GmbH
 
PDF
Monitoring Your AWS EKS Environment with Datadog
DevOps.com
 
JDO 2019: What you should be aware of before setting up kubernetes on premise...
PROIDEA
 
Operational Visibiliy and Analytics - BU Seminar
Canturk Isci
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic
 
Implementing Observability for Kubernetes.pdf
Jose Manuel Ortega Candel
 
Fabio rapposelli pks-vmug
VMUG IT
 
Centralizing Kubernetes and Container Operations
Kublr
 
The ultimate Kubernetes Deployment Checklist - Infra to Microservices
Prakarsh -
 
DevOpsDays Houston 2019 - Terry Shea - Centralizing Kubernetes Operations
DevOpsDays Houston
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 
Kubernetes Administration from Zero to Hero.pdf
ArzooGupta16
 
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
NETWAYS
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Bob Cotton
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
QAware GmbH
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
How to Monitor Microservices
Sysdig
 
Load Balancing in the Cloud using Nginx & Kubernetes
Lee Calcote
 
Opentelemetry - From frontend to backend
Sebastian Poxhofer
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic
 
Monitoring Cockpit for OpenShift Clusters
ConSol Consulting & Solutions Software GmbH
 
Monitoring Your AWS EKS Environment with Datadog
DevOps.com
 
Ad

More from Ajeet Singh Raina (20)

PDF
Delivering Docker & K3s worloads to IoT Edge devices
Ajeet Singh Raina
 
PDF
Delivering Container-based Apps to IoT Edge devices
Ajeet Singh Raina
 
PDF
Docker Trends & Statistics - A 20 Minutes Overview
Ajeet Singh Raina
 
PDF
Real time Object Detection and Analytics using RedisEdge and Docker
Ajeet Singh Raina
 
PDF
OSCONF Jaipur 2020 | Virtual Conference | Oct 10 | Ajeet Singh Raina
Ajeet Singh Raina
 
PDF
Quantifying Your World with AI & Docker on the Edge | OSCONF 2020 Jaipur
Ajeet Singh Raina
 
PDF
Keynote Slides | Ajeet Singh Raina | OSCONF 2020 Hyderabad
Ajeet Singh Raina
 
PDF
IoET Conference 2020 | Keynote Slides | Ajeet Singh Raina
Ajeet Singh Raina
 
PDF
OSCONF 2020 Kochi Conference | KubeZilla | 27 June 2020
Ajeet Singh Raina
 
PDF
Accelerate Your Automation Testing Effort using TestProject & Docker | Docker...
Ajeet Singh Raina
 
PDF
OSCONF 2020 Bengaluru | Powered by Collabnix | Keynote Slides
Ajeet Singh Raina
 
PDF
Top 5 Helpful Tips to Grow Your Local Docker Community
Ajeet Singh Raina
 
PDF
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Ajeet Singh Raina
 
PDF
Demystifying the Nuts & Bolts of Kubernetes Architecture
Ajeet Singh Raina
 
PDF
Introduction to Docker Compose
Ajeet Singh Raina
 
PDF
Current State of Docker Platform - Nov 2019
Ajeet Singh Raina
 
PDF
Collabnix Online Webinar: Integrated Log Analytics & Monitoring using Docker ...
Ajeet Singh Raina
 
PDF
Introduction to Docker Compose | Docker Intermediate Workshop
Ajeet Singh Raina
 
PDF
Simplifying Real Time Data Analytics with Docker, IoT & Cloud
Ajeet Singh Raina
 
PDF
Introducing Pico - Object Detection & Analytics using Docker, IoT & Amazon Re...
Ajeet Singh Raina
 
Delivering Docker & K3s worloads to IoT Edge devices
Ajeet Singh Raina
 
Delivering Container-based Apps to IoT Edge devices
Ajeet Singh Raina
 
Docker Trends & Statistics - A 20 Minutes Overview
Ajeet Singh Raina
 
Real time Object Detection and Analytics using RedisEdge and Docker
Ajeet Singh Raina
 
OSCONF Jaipur 2020 | Virtual Conference | Oct 10 | Ajeet Singh Raina
Ajeet Singh Raina
 
Quantifying Your World with AI & Docker on the Edge | OSCONF 2020 Jaipur
Ajeet Singh Raina
 
Keynote Slides | Ajeet Singh Raina | OSCONF 2020 Hyderabad
Ajeet Singh Raina
 
IoET Conference 2020 | Keynote Slides | Ajeet Singh Raina
Ajeet Singh Raina
 
OSCONF 2020 Kochi Conference | KubeZilla | 27 June 2020
Ajeet Singh Raina
 
Accelerate Your Automation Testing Effort using TestProject & Docker | Docker...
Ajeet Singh Raina
 
OSCONF 2020 Bengaluru | Powered by Collabnix | Keynote Slides
Ajeet Singh Raina
 
Top 5 Helpful Tips to Grow Your Local Docker Community
Ajeet Singh Raina
 
Awesome Traefik - Ingress Controller for Kubernetes - Swapnasagar Pradhan
Ajeet Singh Raina
 
Demystifying the Nuts & Bolts of Kubernetes Architecture
Ajeet Singh Raina
 
Introduction to Docker Compose
Ajeet Singh Raina
 
Current State of Docker Platform - Nov 2019
Ajeet Singh Raina
 
Collabnix Online Webinar: Integrated Log Analytics & Monitoring using Docker ...
Ajeet Singh Raina
 
Introduction to Docker Compose | Docker Intermediate Workshop
Ajeet Singh Raina
 
Simplifying Real Time Data Analytics with Docker, IoT & Cloud
Ajeet Singh Raina
 
Introducing Pico - Object Detection & Analytics using Docker, IoT & Amazon Re...
Ajeet Singh Raina
 
Ad

Recently uploaded (20)

PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Digital Circuits, important subject in CS
contactparinay1
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 

Kubernetes Monitoring & Best Practices

  • 1. Sumo Logic confidential Kubernetes Monitoring & Best Practices 1
  • 2. Sumo Logic confidential • Principal Development Engineer at DellEMC • 1st half of my career was in CGI & VMware • 2nd half of my career has been in System Integration Testing • Docker Captain (since 2016) • Docker Bangalore Meetup Organizer ( 8800+ Registered Users) • DockerLabs Incubator ~ 1700+ Slack Members • Freqeunt Blogger – www.collabnix.com Ajeet Singh Raina Twitter: @ajeetsraina GitHub: ajeetraina 2
  • 3. Sumo Logic confidential Suresh Govindachetty • Enterprise Sales Engineer at Sumo Logic • Formerly with Citrix, HPE,Nortel • Mostly in Presales, Networking and Security 3
  • 4. Sumo Logic confidential Massive shift in monitoring requirements from host based monitoring to “container-specific & service-oriented monitoring” 4
  • 5. Sumo Logic confidential Containers & Kubernetes: The New Reality App Traditional Software Architecture Containerized Architecture Server Orchestrated Containerized Architecture 5
  • 6. Sumo Logic confidential Traditional Monitoring Solution Bare Metal System hypervisor Virtual Machines Containers Monitoring agent 6
  • 7. In a Monolithic World… What to Monitor? Application Hosts on which the applications gets deployed 7
  • 8. In a Cloud Native World… What to Monitor? Hosts Kubernetes Platform Docker Containers Containerized Microservices 8
  • 9. Sumo Logic confidential Benefits of Containers & Kubernetes Portability Scalability Rolling Updates Service Discovery Load Balancing Self Healing Secure 9
  • 10. Sumo Logic confidential While Kubernetes solves old problems, it introduces new ones. 10
  • 11. Sumo Logic confidential K8s is powerful… but Complex ! Kubernetes is great but COMPLEX! $kubectl create –f web.yaml Current Challenges in Kubernetes Monitoring & Troubleshooting
  • 12. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! Everything, In K8s by design Is Ephemeral
  • 13. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! Cascading Failures - Container Communication - Increased Dependencies - Changing Architecture
  • 14. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! More & Noisy Metrics(100x) - Container Unique Metrics - Ephemeral Data - False Positives
  • 15. Sumo Logic confidential Methodology Switch Cattle: (Container) Pet: (K8s Services) o Named with strings of numbers o Almost identical o Ephemeral o Sick: get new one o 1 or more identical Pods o Specific Name( kube_app, kube_name) o Give context to container metrics o Sick: nurse back to health 15
  • 16. Sumo Logic confidential Visualizing Kubernetes Objects Service A Namespace Service B Container Pod C1 Pod C2 Pod C3 Service C Container Container Pet Cattle 16
  • 17. Sumo Logic confidential K8s Monitoring Strategies & Methods - Remote Polling( K8s metric/event APIs) - Node-based (agent per host/ DaemonSets) - Sidecars (agent per Pod) - Logs & APM 17
  • 18. Sumo Logic confidential K8s Metrics - Monitoring Kubernetes Cluster Node resource utilization The number of nodes Running pods - Are number of nodes available sufficient? - Can they handle the entire workload in case a node fails? - Number of nodes available - What you are paying for - Discover what the cluster is being used for. - Network bandwidth - Disk utilization - CPU, and - Memory 18
  • 19. Sumo Logic confidential K8s Metrics - Monitoring Pod Kubernetes Metrics Container Metrics Application Metrics - Developed by the application itself and are related to the business rules it addresses. - For example, a database application exposing metrics related to an indices’ state and statistics concerning tables and relationships. - Using Cadvisor and exposed by Heapster, which queries every node about the running containers. - Metrics like CPU, network, and memory usage compared with the maximum allowed are the highlights. - Monitor how a specific pod and its deployment are being handled - The number of instances a pod has at the moment and how many were expected - How the on-progress deployment is going (how many instances were changed from an older version to a new one), health checks, and some network data available through network services. 19
  • 20. Sumo Logic confidential Node Metrics from node_exporter Container Metrics from cadvisor K8s Metrics from K8s API Server - node_exporter installed a DaemonSet - 1 instance per node - Also called as “K8s Core Metrics” - Metrics about the performance of the k8s API server - Standard Host Metrics - Load Average - CPU - Memory - Disk - Network - Embedded into the Kubelet, so we scrape the Kubelet to get container metrics - For each container on the node: - CPU Usage - Filesystem read/write/limits - Memory usage and limits - Network transmit/receive/dropped - Performance of controller work queues - Request Rates and Latencies - ETCD helper cache work queues and cache performance - General process status(File Descriptors/Memory/CPU seconds. - GoLang Status(GC/Memory/Threads). 100 unique series in typical node Sources of Metrics in Kubernetes 20
  • 21. Sumo Logic confidential Source of Metrics in Kubernetes k8s derived kube-state-metrics Etcd Metrics from etcd - Counts & metadata about many k8s types - Count of many 'nouns' - Resource limits - Container States - Ready/restarts/running/terminated/waiting - Etcd is "master of all truth" within a k8s cluster - Leader existence and leader change rate - Disk Write Performance - Inbound gRPC stats - etcd_http_received_total - etcd_http_failed_total - etcd_http_successful_duration_* 21
  • 23. Sumo Logic confidential #1: Collect Metrics at Container Level but Alerts at Service Level $cat /etc/docker/daemon.json { "metrics-addr" : "127.0.0.1:9323", "experimental" : true }
  • 24. Sumo Logic confidential #2: Monitor Service Level Objective(SLO) per Service per Route • Error Rate per Service per route • Latency per Service per route
  • 25. Sumo Logic confidential #3: Infra Metrics: Utilization - Resource Availability for Pods Vs Allocation - Verify every Pod/Container has a limit (BP) 25
  • 26. Sumo Logic confidential #4: Always alert on High Disk Usage 26 • Monitor ALL disk volumes, including the root file system. • Kubernetes Node Exporter provides a nice metric for tracking devices
  • 27. Sumo Logic confidential #5: Never ignore Kube-system 27 • Total DNS Requests - Resource Issue, Scaling Limits, Application Bug • DNS Request Time - High Latency • Quorum Loss in the cluster/Failure in Leader Election • Unusual High Snapshot Duration • Network criticality
  • 28. Sumo Logic confidential #6: Consistent Metadata Enrichment Tag individual components of Kubernetes so that it can provide context for your services
  • 29. Sumo Logic confidential Best Practice #6: No Better KPI than API - Track the API Gateway for Microservices in order to automatically detect application issues <Image TBD> 29
  • 30. Sumo Logic confidential Discoverability - Infrastructure vs. Service View - Complex - Slow to find and troubleshoot issues - Disconnected from the customer reality - Simple to understand - Quick to find and troubleshoot issues - Tightly connected to the customer reality Service-centric ViewpointInfrastructure-centric Viewpoint 30
  • 31. Sumologic K8s Monitoring and Troubleshooting • Delivers a best in class, end-to-end Kubernetes Monitoring and Troubleshooting experience. • Open source collectors (Fluentbit, Fluentd,Prometheus, Falco) • Visualize K8s hierarchies through Deployment, Service, Node and Namespace views • Honeycomb visualization - quick overview of data in a visually digestible way. • Simplified Monitoring and Troubleshooting • Correlation of Logs, Metrics, event and Security • Integrated security with Falco+ partner apps
  • 32. Sumo Logic confidential Data Collection with Sumo Logic 32
  • 33. Sumo Logic Confidential Our Kubernetes Partner Apps - Security App Purpose Details SecOps Provides comprehensive monitoring and analysis solution for detecting vulnerabilities and potential threats throughout your environment, including hosts, containers, images and registry. SecOps Helps you detect, investigate and remediate vulnerabilities, insecure configurations and compliance violations across all container and Kubernetes environments. SecOps Provides granular security and compliance control monitoring to DevSecOps teams throughout the cloud native application lifecycle, from development to runtime in production. SecOps Gives customers the ability to detect, investigate, and remediate vulnerabilities in software artifacts across your deployment environments. 33
  • 34. Sumo Logic Confidential Ecosystem - Unified K8s DevOps and SecOps Monitoring CI/CD DevOps SecOps circleci codefresh armory harness Kubernetes AmazonEKS Google Kubernetes Service Azure Kubernetes Service Falco Twistlock StackRox aqua Tigera JFrog Xray 34