Monitoring is
Never “Done”
@melaniemj
Responsibilities @ Yardi
Implementation and administration of monitoring,
alerting, and log aggregation/analysis tools.
o 15,000+ Devices
o 9 Datacenters
o 5000+ Customer Installations
o We monitor windows envs with linux envs
This was me in 2008 @ Point2
How code is delivered
How code operates in production
A good problem to have
Everyone wants “the monitoring” so they can say
“it’s monitored”
Communicating Work
o Classify
o Quantify
o Qualify
Words....
o Logging
o Alerting
o Dashboards
o Reports
o 4-9s
o 24x7x365 this shit can’t go down
Can it be this simple?
Let’s talk about “the monitoring” for X
Be awesome
X is monitored
DCVA (OODA)
1. Definition
I can hit this one page so it’s up right?
No thanks, let’s redefine status
1. Definition
o What questions are you trying to answer?
o What information do you need when a failure
occurs?
o What are the most common failures?
o Who is the audience for the information?
2. Checks & Collections
o Environment & Code
o Data points
o Detailed logs
o Current state
3. Visualization
o Analysis
o Dashboards
o Correlations
4. Action
o Fault detection
o Alerting
o RCA
Monitoring Is Never Done
Cycle
(What to collect)
(Inform on failure) (How to collect)
(Make collections pretty)
Team Time Distribution
Time Distribution (Desired)
Is “X” monitored?
When “X” goes into some degraded state
o The right people know.
o They have enough information to find the
problem, recover, and later to do RCA.
o If they don’t they will revisit definition.
How does your team
o Classify
o Quantify
o Qualify
Monitoring is Never “Done”
Melanie Cey
@melaniemj
Senior Systems Analyst
Systems Reliability Engineering
@ Yardi

More Related Content

PDF
Anomaly detection made easy
PDF
Velocity 2015 linux perf tools
PDF
SPOF - Single "Person" of Failure
PDF
Chaos patterns - architecting for failure in distributed systems
PDF
Un-broken Logging - Operability.io 2015 - Matthew Skelton
PDF
Devops and Immutable infrastructure - Cloud Expo 2015 NYC
PPTX
Time to say goodbye to your Nagios based setup
PDF
Production testing through monitoring
Anomaly detection made easy
Velocity 2015 linux perf tools
SPOF - Single "Person" of Failure
Chaos patterns - architecting for failure in distributed systems
Un-broken Logging - Operability.io 2015 - Matthew Skelton
Devops and Immutable infrastructure - Cloud Expo 2015 NYC
Time to say goodbye to your Nagios based setup
Production testing through monitoring

Viewers also liked (18)

PDF
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...
PPTX
DevOps Kaizen: Practical Steps to Start & Sustain a Transformation
PDF
Stop using Nagios (so it can die peacefully)
PPTX
DevSecCon KeyNote London 2015
PDF
2014 devops conferences
PDF
Monitorama: How monitoring can improve the rest of the company
PPTX
Grokking Grok: Monitorama PDX 2015
PDF
Monitorama PDX 2016 - Vizceral: Traffic Intuition
PDF
Stream Processing Inside Librato [Monitorama PDX 2015]
PDF
Metrics 2.0 @ Monitorama PDX 2014
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
PDF
On Centralizing Logs
PDF
ContainerCon - Test Driven Infrastructure
PPTX
A People's History of Microservices
POTX
Envisioning your Monitoring Strategy
PPTX
Grafana
PDF
Monitorama 2016
PDF
Taking AppSec to 11 - BSides Austin 2016
Continuous Delivery Tools Collaboration Conways Law - QCon London - Matthew S...
DevOps Kaizen: Practical Steps to Start & Sustain a Transformation
Stop using Nagios (so it can die peacefully)
DevSecCon KeyNote London 2015
2014 devops conferences
Monitorama: How monitoring can improve the rest of the company
Grokking Grok: Monitorama PDX 2015
Monitorama PDX 2016 - Vizceral: Traffic Intuition
Stream Processing Inside Librato [Monitorama PDX 2015]
Metrics 2.0 @ Monitorama PDX 2014
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
On Centralizing Logs
ContainerCon - Test Driven Infrastructure
A People's History of Microservices
Envisioning your Monitoring Strategy
Grafana
Monitorama 2016
Taking AppSec to 11 - BSides Austin 2016
Ad

Similar to Monitoring Is Never Done (20)

PDF
People & Products – Lessons learned from the daily IT madness
PPTX
451 and Cylance - The Roadmap To Better Endpoint Security
PDF
8.16 tm silk_road_finalslides
PDF
Compliance Training is Ruining My Business!
PDF
Daniel Lance - What "You've Got Mail" Taught Me About Cyber Security
PDF
Wiring the IoT for modern manufacturing
PDF
SF Bay Area Splunk User Group Meeting October 5, 2022
PPTX
Speed with Confidence
PPTX
Speed with confidence
PPTX
KScope Webinar: Couples Therapy: Getting Finance & IT to Play Nice
PDF
I believe I can fly (Extract London 2015)
PDF
How to not fail at security data analytics (by CxOSidekick)
PDF
OpsStack Overview 20170806.1
 
PPT
Will there be an IT Risk Management 2.0?
PDF
A Big Dashboard of Problems.pdf
PPTX
Alliance2011 goldcoast Farid
PPTX
Why 4Segment
PPTX
Why 4Segments
PDF
The Most Important Thing: How Mozilla Does Security and What You Can Steal
PDF
DBA Tips and Tricks - Presentation
People & Products – Lessons learned from the daily IT madness
451 and Cylance - The Roadmap To Better Endpoint Security
8.16 tm silk_road_finalslides
Compliance Training is Ruining My Business!
Daniel Lance - What "You've Got Mail" Taught Me About Cyber Security
Wiring the IoT for modern manufacturing
SF Bay Area Splunk User Group Meeting October 5, 2022
Speed with Confidence
Speed with confidence
KScope Webinar: Couples Therapy: Getting Finance & IT to Play Nice
I believe I can fly (Extract London 2015)
How to not fail at security data analytics (by CxOSidekick)
OpsStack Overview 20170806.1
 
Will there be an IT Risk Management 2.0?
A Big Dashboard of Problems.pdf
Alliance2011 goldcoast Farid
Why 4Segment
Why 4Segments
The Most Important Thing: How Mozilla Does Security and What You Can Steal
DBA Tips and Tricks - Presentation
Ad

Recently uploaded (20)

PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
Configure Apache Mutual Authentication
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Build Your First AI Agent with UiPath.pptx
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
Produktkatalog fĂĽr HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Configure Apache Mutual Authentication
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
4 layer Arch & Reference Arch of IoT.pdf
sbt 2.0: go big (Scala Days 2025 edition)
UiPath Agentic Automation session 1: RPA to Agents
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Basics of Cloud Computing - Cloud Ecosystem
Flame analysis and combustion estimation using large language and vision assi...
NewMind AI Weekly Chronicles – August ’25 Week III
Custom Battery Pack Design Considerations for Performance and Safety
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Build Your First AI Agent with UiPath.pptx
OpenACC and Open Hackathons Monthly Highlights July 2025
Microsoft Excel 365/2024 Beginner's training
Produktkatalog fĂĽr HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Module 1 Introduction to Web Programming .pptx
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf

Monitoring Is Never Done

Editor's Notes

  • #13: What to collect
  • #14: How to collect
  • #15: Make collections pretty
  • #16: Inform on failure