SlideShare a Scribd company logo
Zabbix
Smart problem detection
Who am I?
Alexei Vladishev
Creator of Zabbix
CEO, Architect and Product Manager
Twitter: @avladishev
Email: alex@zabbix.com
Our plan
• How Zabbix works
• Basic problem detection
• Advanced problem detection
• Do some practical work
What is Zabbix?
Enterprise level Free and Open Source monitoring solution
Benefits of Zabbix
• True Free software
• All in one solution
• Easy to maintain
• Mature, high quality and reliable
• Flexible (also applies to problem detection)
How Zabbix works
DATABASE ZABBIX SERVER
Visualisation
History Analysis Data collection
Notifications
Data collection
Availability, performance, integrity, environmental checks, KPI & SLA
Methods of data collection
Pull
• Service checks: HTTP, SSH, IMAP, NTP, etc
• Passive agent
• Script execution using SSH and Telnet
Push
• Active agent
• Zabbix Trapper and SNMP Traps
• Monitoring of log files and Windows event logs
Active vs Passive
How often execute checks?
Every N seconds
• Zabbix will evenly distribute checks
Different frequency in different time periods
• Every X seconds in working time
• Every Y second in weekend
At a specific time (Zabbix 3.0)
• Ready for business checks
• Every hour starting from 9:00 at working hours (9:00, 10:00, …, 18:00)
How to detect problems
in this data flow?
Triggers!
Trigger is
problem definition
Triggers
Example
{server:system.cpu.load.last()} > 5
Operators
- + / * < > = <> <= >= or and not
Functions
min max avg last count date time diff regexp and much more!
Analyse everything: any metric and any host
{node1:system.cpu.load.last()} > 5 and {node2:system.cpu.load.last()} > 5 and 

{nodes:tps.last()} > 5000
Junior level
Performance
{server:system.cpu.load.last()} > 5
False positives
0
2,5
5
7,5
10
10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50
{server:system.cpu.load.last()} > 5
Flapping
Junior level
Availability
{server:net.tcp.service[http].last()} = 0
Too sensitive
0
0,25
0,5
0,75
1
10:01 10:02 10:03 10:04 10:05 10:06 10:07 10:08 10:09 10:10 10:11 10:12 10:13 10:14
{server:net.tcp.service[http].last()} = 0
Too sensitive leads to
false positives
How to get rid of
false positives?
Properly define problem
conditions and think
carefully!
system is overloaded
running out of disk space
a service is not available
What really means ?
Use history
System performance
{server:system.cpu.load.min(10m)} > 5
Service availability
{server:net.tcp.service[http].max(5m)} = 0
{server:net.tcp.service[http].max(#3)} = 0
Analyse history
0
2,5
5
7,5
10
10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50 10:55 11:00 11:05 11:10
{server:system.cpu.load.min(10m)} > 5
Analyse history
0
0,25
0,5
0,75
1
10:01 10:02 10:03 10:04 10:05 10:06 10:07 10:08 10:09 10:10 10:11 10:12 10:13 10:14 10:15
{server:net.tcp.service[http].max(#3)} = 0
Problem disappeared
!=
problem is resolved
A few examples
Problem: free disk space < 10%

No problem: free disk space = 10.001% Resolved?
Problem: CPU load > 5

No problem: CPU load = 4.99 Resolved?
Problem: SSH check failed

No problem: SSH is up Resolved?
Different conditions for
problem and recovery
Before
{server:system.cpu.load.last()} > 5
Now
({TRIGGER.VALUE=0} and {server:system.cpu.load.last()}>5)
or
({TRIGGER.VALUE=1} and {server:system.cpu.load.last()}>1)
Hysteresis
0
2,5
5
7,5
10
10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50
{server:system.cpu.load.last()} > 5 … {server:system.cpu.load.last()} > 1
No flapping!
Several examples
System is overloaded
({TRIGGER.VALUE=0} and {server:system.cpu.load.min(5m)}>3)

or

({TRIGGER.VALUE=1} and {server:system.cpu.load.max(2m)}>1)
No free disk space on /



({TRIGGER.VALUE=0} and {server:vfs.fs.size[/,pfree].last()}<10)

or

({TRIGGER.VALUE=1} and {server:vfs.fs.size[/,pfree].min(15m)}<30)
SSH server is not available



({TRIGGER.VALUE=0} and {server:net.tcp.service[ssh].max(#3)}=0)

or

({TRIGGER.VALUE=1} and {server:net.tcp.service[ssh].min(#10)}=0)
Anomalies
How to detect?
Compare with a norm, where norm is system state in
the past.
Average CPU load for the last hour is 2x higher than
CPU load for the same period week ago
{server:system.cpu.load.avg(1h)} >
2 * {server:system.cpu.load.avg(1h,7d)}
Anomaly
0
2,5
5
7,5
10
10:00 10:05 10:10 10:15 10:20 10:25 10:30 10:35 10:40 10:45 10:50 10:55 11:00 11:05 11:10
Compare with 7 days ago
Does history analysis affect
performance of Zabbix?
Yes, but not so much.
Especially starting from Zabbix 2.2.0.
DATABASE ZABBIX SERVERCACHE
Dependencies
Hide dependent problems.
CRM is not available
Database is down
No free disk space
How to react on
problems?
Possible reactions
• Automatic problem resolution
• Sending notification to user and user group
• Opening tickets in Helpdesk systems
Escalate!
• Immediate reaction
• Delayed reaction
• Notification if automatic

action failed
• Repeated notifications
• Escalation to a new level
Example
Critical problem
Repeated Email
SMS and ticket
Service restart
SMS to manager
5 min
10 min
15 min
20 min
0 min
Summary
• Analyse history
• No problem != solution



Use different conditions for problem and recovery
• Take advantage of anomaly detection
• Resolve common problem automatically
• Do not afraid to escalate!
Thank you!
twitter.com/zabbix
Welcome to Zabbix conference! Riga, September 11-12.

More Related Content

What's hot (20)

PPTX
Zabbix
pundir5
 
PDF
Virtualisation
Majid CHADAD
 
ODP
Introduction to Nginx
Knoldus Inc.
 
PDF
Comprendre la securite web
Christophe Villeneuve
 
PPTX
High Availability Content Caching with NGINX
NGINX, Inc.
 
PPTX
Android組込み開発基礎コース Armadillo-440編
OESF Education
 
PDF
Androidのリカバリシステム (Androidのシステムアップデート)
l_b__
 
PDF
Tadx - Présentation Conteneurisation
TADx
 
PDF
AndroidとSELinux
android sola
 
PDF
Presentation citrix desktop virtualization
xKinAnx
 
PPTX
Présentation de nagios
ilyassin
 
PPTX
Load Balancing and Scaling with NGINX
NGINX, Inc.
 
PPT
Red Hat Ansible 적용 사례
Opennaru, inc.
 
PDF
Zabbix Monitoring Platform
Seyedmajid Etehadi
 
PPTX
Modele mvc
Soulef riahi
 
PPTX
[NDC 2018] 테라 콘솔 포팅기 - UE3 게임 현세대 콘솔로 이식하기
Haechan Lee
 
PDF
Ansible-cours .pdf
Jaouad Assabbour
 
PDF
The kvm virtualization way
Francisco Gonçalves
 
PDF
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
Zabbix
pundir5
 
Virtualisation
Majid CHADAD
 
Introduction to Nginx
Knoldus Inc.
 
Comprendre la securite web
Christophe Villeneuve
 
High Availability Content Caching with NGINX
NGINX, Inc.
 
Android組込み開発基礎コース Armadillo-440編
OESF Education
 
Androidのリカバリシステム (Androidのシステムアップデート)
l_b__
 
Tadx - Présentation Conteneurisation
TADx
 
AndroidとSELinux
android sola
 
Presentation citrix desktop virtualization
xKinAnx
 
Présentation de nagios
ilyassin
 
Load Balancing and Scaling with NGINX
NGINX, Inc.
 
Red Hat Ansible 적용 사례
Opennaru, inc.
 
Zabbix Monitoring Platform
Seyedmajid Etehadi
 
Modele mvc
Soulef riahi
 
[NDC 2018] 테라 콘솔 포팅기 - UE3 게임 현세대 콘솔로 이식하기
Haechan Lee
 
Ansible-cours .pdf
Jaouad Assabbour
 
The kvm virtualization way
Francisco Gonçalves
 
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 

Viewers also liked (20)

PDF
Zabbix 3.0 and beyond - FISL 2015
Zabbix
 
PPTX
Zabbix - Alem da Infraestrutura - Parte 2
Luiz Sales
 
PDF
Zabbix: Apresentação meetup Fortaleza/CE (Brasil)
Werneck Costa
 
PDF
Alexei Vladishev - Zabbix - Monitoring Solution for Everyone
Zabbix
 
PDF
Zabbix para iniciantes
Werneck Costa
 
PPTX
Gerenciamento de Redes com Zabbix
André Déo
 
PDF
Zabbix: Uma ferramenta para Gerenciamento de ambientes de T.I
Aécio Pires
 
PDF
Monitoring all Elements of Your Database Operations With Zabbix
Zabbix
 
PDF
Integração do Zabbix com Grafana
Aécio Pires
 
PDF
Zabbix meetup RJ: Integrações e opensource
Filipe Paternot
 
PDF
Zabbix meetup RJ: Infra, tuning e documentação
Filipe Paternot
 
PDF
Monitoramento Opensource com Zabbix
Renato Batista
 
PDF
Aula 008 - Gerenciamento e Desempenho de Redes: Halexsandro Sales
Verdanatech Soluções em TI
 
ODP
Zabbix API at FISL12 by Takanori Suzuki
takanori suzuki
 
ODP
Plugin Geolocalização - Prêmio Cidadania Eletrônica
aristotelesaraujo
 
PDF
Palestra Zabbix no 12 Geinfo (2013)
André Luis Boni Déo
 
ODP
Latinoware2013 - Implentando Plugin de Geolocalização no Zabbix
aristotelesaraujo
 
PPT
Workshop de Monitoramento com Zabbix e OCS
Linux Solutions
 
ODP
FLISOL-Jaguaruana/CE - 2013 - Monitoramento com Software Livre - Zabbix 2.0
aristotelesaraujo
 
PDF
Monitoramento de ativos com zabbix
Rafael Gomes
 
Zabbix 3.0 and beyond - FISL 2015
Zabbix
 
Zabbix - Alem da Infraestrutura - Parte 2
Luiz Sales
 
Zabbix: Apresentação meetup Fortaleza/CE (Brasil)
Werneck Costa
 
Alexei Vladishev - Zabbix - Monitoring Solution for Everyone
Zabbix
 
Zabbix para iniciantes
Werneck Costa
 
Gerenciamento de Redes com Zabbix
André Déo
 
Zabbix: Uma ferramenta para Gerenciamento de ambientes de T.I
Aécio Pires
 
Monitoring all Elements of Your Database Operations With Zabbix
Zabbix
 
Integração do Zabbix com Grafana
Aécio Pires
 
Zabbix meetup RJ: Integrações e opensource
Filipe Paternot
 
Zabbix meetup RJ: Infra, tuning e documentação
Filipe Paternot
 
Monitoramento Opensource com Zabbix
Renato Batista
 
Aula 008 - Gerenciamento e Desempenho de Redes: Halexsandro Sales
Verdanatech Soluções em TI
 
Zabbix API at FISL12 by Takanori Suzuki
takanori suzuki
 
Plugin Geolocalização - Prêmio Cidadania Eletrônica
aristotelesaraujo
 
Palestra Zabbix no 12 Geinfo (2013)
André Luis Boni Déo
 
Latinoware2013 - Implentando Plugin de Geolocalização no Zabbix
aristotelesaraujo
 
Workshop de Monitoramento com Zabbix e OCS
Linux Solutions
 
FLISOL-Jaguaruana/CE - 2013 - Monitoramento com Software Livre - Zabbix 2.0
aristotelesaraujo
 
Monitoramento de ativos com zabbix
Rafael Gomes
 
Ad

Similar to Zabbix Smart problem detection - FISL 2015 workshop (20)

PDF
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix
 
PPT
Role of OpManager in event and fault management
ManageEngine
 
PPTX
Problem management foundation - Lifecycle
Ronald Bartels
 
PPTX
Free Netflow analyzer training - diagnosing_and_troubleshooting
ManageEngine, Zoho Corporation
 
PDF
Brighttalk what should we be monitoring - final
Andrew White
 
PPTX
Netcetera Proactive Management Service
Peter Skelton
 
PPTX
Bandwidth reporting, capacity planning, and traffic shaping: NetFlow Analyzer...
ManageEngine, Zoho Corporation
 
PDF
Server monitoring basics every sysadmin should know
server-finder.com
 
PPTX
Understanding firewall-policies-their-effectiveness-in-defending-against-netw...
ManageEngine, Zoho Corporation
 
PDF
Rihards Olups - Zabbix at Nokia - Case Study
Zabbix
 
PPTX
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
ManageEngine, Zoho Corporation
 
PPTX
Resolving problems & high availability
Zend by Rogue Wave Software
 
PDF
Zabbix 2014 Conference : Implementing Zabbix in large Banking Environment
Alain Ganuchaud
 
PPTX
Understanding firewall policies and their effectiveness in defending against ...
ManageEngine, Zoho Corporation
 
PDF
Monitoring Far Beyond the Operating System - WeOp 2014
Marcus Vechiato
 
PDF
MongoDB Operational Best Practices (mongosf2012)
Scott Hernandez
 
PPTX
Preventing the Next Deployment Issue with Continuous Performance Testing and ...
Correlsense
 
PDF
Flopsar light-galaxy eng-nl
Adam Khan
 
PPT
Top 5 server performance problems and how to resolve them using OpManager
ManageEngine
 
PPTX
[Free OpManager training] Part 4- Network fault-management & IT automation
ManageEngine, Zoho Corporation
 
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix
 
Role of OpManager in event and fault management
ManageEngine
 
Problem management foundation - Lifecycle
Ronald Bartels
 
Free Netflow analyzer training - diagnosing_and_troubleshooting
ManageEngine, Zoho Corporation
 
Brighttalk what should we be monitoring - final
Andrew White
 
Netcetera Proactive Management Service
Peter Skelton
 
Bandwidth reporting, capacity planning, and traffic shaping: NetFlow Analyzer...
ManageEngine, Zoho Corporation
 
Server monitoring basics every sysadmin should know
server-finder.com
 
Understanding firewall-policies-their-effectiveness-in-defending-against-netw...
ManageEngine, Zoho Corporation
 
Rihards Olups - Zabbix at Nokia - Case Study
Zabbix
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
ManageEngine, Zoho Corporation
 
Resolving problems & high availability
Zend by Rogue Wave Software
 
Zabbix 2014 Conference : Implementing Zabbix in large Banking Environment
Alain Ganuchaud
 
Understanding firewall policies and their effectiveness in defending against ...
ManageEngine, Zoho Corporation
 
Monitoring Far Beyond the Operating System - WeOp 2014
Marcus Vechiato
 
MongoDB Operational Best Practices (mongosf2012)
Scott Hernandez
 
Preventing the Next Deployment Issue with Continuous Performance Testing and ...
Correlsense
 
Flopsar light-galaxy eng-nl
Adam Khan
 
Top 5 server performance problems and how to resolve them using OpManager
ManageEngine
 
[Free OpManager training] Part 4- Network fault-management & IT automation
ManageEngine, Zoho Corporation
 
Ad

More from Zabbix (20)

PDF
Zabbix Conference LatAm 2016 - Jessian Ferreira - Wireless with Zabbix
Zabbix
 
PDF
Zabbix Conference LatAm 2016 - Andre Deo - Zabbix Brazil Community
Zabbix
 
PDF
Zabbix Conference LatAm 2016 - Jorge Pretel - Low Level Discovery for ODBC an...
Zabbix
 
PDF
Zabbix Conference LatAm 2016 - Andre Deo - SNMP and Zabbix
Zabbix
 
PDF
Zabbix Conference LatAm 2016 - Marcio Prop - Monitoring Complex Environments ...
Zabbix
 
PDF
Zabbix Conference LatAm 2016 - Daniel Nasiloski - Extending Zabbix - Interact...
Zabbix
 
PDF
Zabbix Conference LatAm 2016 - Filipe Paternot - Zbx@Globo Automation+Integra...
Zabbix
 
PDF
Zabbix Conference LatAm 2016 - Douglas Esteves - Zabbix at UNICAMP
Zabbix
 
PDF
Ryan Armstrong - Monitoring More Than 6000 Devices in Zabbix | ZabConf2016
Zabbix
 
PDF
Rafael Martinez Guerrero - Zabbix at the University of Oslo | ZabConf2016
Zabbix
 
PDF
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
Zabbix
 
PDF
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
Zabbix
 
PDF
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Zabbix
 
PDF
Raymond Kuiper - Zen and The Art of Zabbix Template Design | ZabConf2016
Zabbix
 
PDF
Dimitri Bellini and Pietro Antonacci - Manage Zabbix Proxies in Remote Networ...
Zabbix
 
PDF
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Zabbix
 
PDF
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Zabbix
 
PDF
Lukáš Malý - Log management ELISA controlled by Zabbix | ZabConf2016
Zabbix
 
PDF
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Zabbix
 
PDF
Oleg Ivanivskyi - Lessons Learned While Being On-Site | ZabConf2016
Zabbix
 
Zabbix Conference LatAm 2016 - Jessian Ferreira - Wireless with Zabbix
Zabbix
 
Zabbix Conference LatAm 2016 - Andre Deo - Zabbix Brazil Community
Zabbix
 
Zabbix Conference LatAm 2016 - Jorge Pretel - Low Level Discovery for ODBC an...
Zabbix
 
Zabbix Conference LatAm 2016 - Andre Deo - SNMP and Zabbix
Zabbix
 
Zabbix Conference LatAm 2016 - Marcio Prop - Monitoring Complex Environments ...
Zabbix
 
Zabbix Conference LatAm 2016 - Daniel Nasiloski - Extending Zabbix - Interact...
Zabbix
 
Zabbix Conference LatAm 2016 - Filipe Paternot - Zbx@Globo Automation+Integra...
Zabbix
 
Zabbix Conference LatAm 2016 - Douglas Esteves - Zabbix at UNICAMP
Zabbix
 
Ryan Armstrong - Monitoring More Than 6000 Devices in Zabbix | ZabConf2016
Zabbix
 
Rafael Martinez Guerrero - Zabbix at the University of Oslo | ZabConf2016
Zabbix
 
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
Zabbix
 
Wolfgang Alper - Zabbix Meets OPS Control / Rundeck | ZabConf2016
Zabbix
 
Sumit Goel - Monitoring Cloud Applications Using Zabbix | ZabConf2016
Zabbix
 
Raymond Kuiper - Zen and The Art of Zabbix Template Design | ZabConf2016
Zabbix
 
Dimitri Bellini and Pietro Antonacci - Manage Zabbix Proxies in Remote Networ...
Zabbix
 
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Zabbix
 
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Zabbix
 
Lukáš Malý - Log management ELISA controlled by Zabbix | ZabConf2016
Zabbix
 
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Zabbix
 
Oleg Ivanivskyi - Lessons Learned While Being On-Site | ZabConf2016
Zabbix
 

Recently uploaded (20)

PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
July Patch Tuesday
Ivanti
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 

Zabbix Smart problem detection - FISL 2015 workshop