SlideShare a Scribd company logo
Monitoring Maturity 
A 16 Year Journey and (some of) the Lessons Learned 
Simon Finch 
NOC Monitoring & Event Manager 
sfinch@westpac.com.au 
https://www.linkedin.com/pub/simon-finch/23/461/3b7
Westpac Stats 
• Australia’s first and oldest bank (1817) 
• World’s Ninth largest bank 
• Global100 world’s most sustainable company 
(2014) 
• World’s most sustainable bank 2014 
(Dow Jones Sustainability Indices) 
• Named as one of the World's Most Ethical 
Companies from 2008 - 2013 by the Ethisphere 
Institute
Westpac Stats 
• 12 Million Customers 
• 570,000 Share Holders 
• 37,000 Employee’s 
• 1300 Australian points of representation 
• Offices in London, New York, Hong Kong, 
Singapore, India, Shanghai, Beijing, New 
Zealand, Pacific Islands & Indonesia
About This Presentation 
1. Monitoring Landscape 
2. NOC Dash Boards 
3. Service Desk Integration 
4. Mainframe Alerts via MQSI
Part 1 
Monitoring 
Landscape
Monitoring Landscape 
• Westpac – multi branded 
• IT is both insourced & outsourced 
• Nagios used extensively throughout 
insourced brands (SGB, BSA, BoM)
Time Line 
• 1998 - Vendor Framework Installed 
• 2004 - First Nagios used to fill gaps 
• 2006 - Nagios replaced proprietary 
monitoring framework
The Result ? 
• Paradigm shift (power to the people) 
• Deeper monitoring penetration 
• Agents part of server build
Monitoring Today 
• Critical Applications 
• Branch Locations 
• Wintel, Linux, Unix 
• 4000 hosts 
• 37000 services
Part 2 
Dash Boards
NOC Dash Boards
NOC Dash Boards 
• Major App status at a glance 
• Bright & colourful 
• Simple & effective 
• Drill down to application map 
• Time stamped 
• All done with NagVis
Dash Board Evolution 
Early 
example
Dash Board Evolution 
Then we 
tried
Dash Board Evolution 
Today
Dash Board Evolution
VMware Dash Boards 
• VMware - interesting monitoring challenges 
• Metrics from each ESX host are monitored 
• Metrics are clustered to prevent false positives 
• > 20% Failure shown as a warning 
• > 40% Failure shown as critical
Dash Board Drill Down
VMware Dash Boards 
• Top level status shows nothing is wrong 
• VMware clusters have lots of redundancy 
• Hover display shows some detail 
• Although it looks simple, there are more than 
150 metrics collected and clustered to build 
the status of each VMware ESX cluster.
VMware Dash Boards
VMware Dash Boards 
Drill down shows the details 
for the support teams.
VMware Dash Boards
Application Support 
• Specific Custom Requirements 
• Displayed on large monitors 
• Used for day to day operational status 
• Summary rolled up to the NOC
Application Support
Application Support
Application Support
NagVis & Livestatus 
• NagVis is extremely flexible 
• More so with Livestatus 
• Use simple perl script to extract data 
• Enhance dash boards anyway you 
want
NagVis & Livestatus 
#!/usr/bin/perl -wT 
use CGI qw/:standard/; 
use Monitoring::Livestatus; 
my $q = new CGI; 
print $q->header(); 
$backend = $q->param( 'backend' ); 
$filtergrp = $q->param( 'hostGroup' ); 
$ml = Monitoring::Livestatus->new( 
server => ‘backend hostname:port' 
); 
my $up = $ml->selectscalar_value("GET hostsnFilter: host_groups >= $filtergrpnStats: state = 0"); 
my $down = $ml->selectscalar_value("GET hostsnFilter: host_groups >= $filtergrpnStats: state = 1"); 
my $unknown = $ml->selectscalar_value("GET hostsnFilter: host_groups >= $filtergrpnStats: state = 2"); 
my $total = $up + $down + $unknown; 
print "$down/$total<br>";
NagVis & Livestatus
Summary 
• Simple is effective 
• Clear status summary 
• Bold broad use of colour 
• Details only for Specialists
Part 3 
Service Desk 
Integration
Service Desk Integration 
Two tried and proven methods: 
1. Direct to API 
2. Event Management
Service Desk Integration 
Direct to API
Service Desk Integration 
Event Management
Service Desk Integration 
• Notifications: email, sms, remedy 
• Custom macro in top level templates 
• Common script for host & Service 
• Custom macro sent to notify script 
• Email & sms easy to handle
Service Desk Integration 
• Service desk ticket creation handled by 
sending event to BMC BEM. 
• Same method as email & SMS. 
• Different handler subroutine in notify 
script.
Service Desk Integration 
• Nagios ContactGroup as ResolverGrp 
• Worked fine until department mergers 
changed ContactGroup names 
• Now using custom macro in top level 
templates
What does that give us ? 
• Control of notification down to 
individual host & service level 
• Ability to set notification at template 
level and override at service / host level 
• Email / sms / tickets in any combination 
• Complete flexibility
Summary 
• API – more control 
• API – much harder to do 
• EvtMgmt – not as much control 
• EvtMgmt – very simple to do
Part 4 
Mainframe 
Alerts
Mainframe Alerts via MQSI 
• Why ? (you want to do what ????) 
• Why bother ? (That’s what Ops are for, 
aren't they ?) 
• How ? (it’s easier than it sounds)
Mainframe Alerts - Why ? 
• Most of our distributed apps are back ended 
by mainframes 
• Mainframe hiccups cause apps to fail 
• Dash boards all stay a lovely shade of 
Green 
• MF vendor options outrageously expensive 
($100k - $1M)
Mainframe Alerts - How ? 
• Anybody who has legacy MF Apps also has MQSI 
(If not, why not ?) 
• SysProgs already logging errors 
• Setup MQSI channel from MF to Intel host 
• Agree on simple delimited message format 
• field1|field2|field3|field4|field5 …fieldN 
• SysProgs send error messages via MQSI
Mainframe Alerts - How ? 
• Message arrives, MQSI auto runs custom script 
• Simple perl split on delimiter to recover fields. 
• Perl script then formats alert into any format you 
want and sends it to Nagios 
• NSCA, NDRP, SNMP, NagEventLog or what ever 
option you have available.
Mainframe Alerts - How ?
Mainframe Alerts - Result ? 
• MF App events now in Nagios 
• Map MF events to appropriate Apps 
• Dashboards show correct status 
• Happy NOC Manager
Summary 
• MF’s are still with us. 
• Process critical transactions. 
• Status not complete without MF 
app detail.
Questions?
Thanks for coming 
Simon Finch 
sfinch@westpac.com.au 
https://www.linkedin.com/pub/simon-finch/23/461/3b7

More Related Content

PPTX
Qblock overview - Qognify and DellEMC partnership
Qognify
 
PDF
Clone your Network with OpenNebula
NETWAYS
 
ODP
Nagios Conference 2013 - Shamas Demoret - Power Up! The Multifaceted Benefits...
Nagios
 
ODP
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios
 
PPTX
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios
 
ODP
Nagios Conference 2011 - Ethan Galstad - Keynote Presentation
Nagios
 
PPTX
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Nagios
 
PDF
Proactive monitoring tools or services - Open Source
B.A.
 
Qblock overview - Qognify and DellEMC partnership
Qognify
 
Clone your Network with OpenNebula
NETWAYS
 
Nagios Conference 2013 - Shamas Demoret - Power Up! The Multifaceted Benefits...
Nagios
 
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios
 
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios
 
Nagios Conference 2011 - Ethan Galstad - Keynote Presentation
Nagios
 
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Nagios
 
Proactive monitoring tools or services - Open Source
B.A.
 

Similar to Nagios Conference 2014 - Simon Finch - Monitoring Maturity A 16 Year Journey (20)

PPTX
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases
Nagios
 
PDF
Nagios, Getting Started.
Hitesh Bhatia
 
PPTX
Functionality, security and performance monitoring of web assets (e.g. Joomla...
Sanjay Willie
 
PPT
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
Nagios
 
PDF
Multi Layer Monitoring V1
Lahav Savir
 
PDF
An Introduction To Monitoring With Nagios PowerPoint Presentation Slides
SlideTeam
 
PDF
System Monitoring With Nagios PowerPoint Presentation Slides
SlideTeam
 
PPTX
Nagios Conference 2012 - Ethan Galstad - Keynote
Nagios
 
PPTX
Network Monitoring Basics
Rob Dunn
 
PDF
NetEye Conference 2010: Ethan Galstad on Nagios
Würth Phoenix
 
PPT
Nagios Conference 2011 - Dave Williams - Nagios In The Real World - The Datac...
Nagios
 
PDF
Nagios 3
rajni_kant
 
ODP
Monitoring at/with SUSE 2015
Lars Vogdt
 
ODP
Nagios Conference 2011 - Mike Guthrie - Exploring Nagios Visualization Tools
Nagios
 
PDF
Business Service Monitoring Challenges in the Cloud Era
Rodrigue Chakode
 
PPTX
BsidesMCR_2016-what-can-infosec-learn-from-devops
James '​-- Mckinlay
 
PDF
Nagios Conference 2013 - Rodrigue Chakode - Effective Monitoring for Demanding
Nagios
 
PPTX
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios
 
PPTX
Nagios XI Best Practices
Nagios
 
PPT
Ikon Managed Services
Cyril Simonnet
 
Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases
Nagios
 
Nagios, Getting Started.
Hitesh Bhatia
 
Functionality, security and performance monitoring of web assets (e.g. Joomla...
Sanjay Willie
 
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
Nagios
 
Multi Layer Monitoring V1
Lahav Savir
 
An Introduction To Monitoring With Nagios PowerPoint Presentation Slides
SlideTeam
 
System Monitoring With Nagios PowerPoint Presentation Slides
SlideTeam
 
Nagios Conference 2012 - Ethan Galstad - Keynote
Nagios
 
Network Monitoring Basics
Rob Dunn
 
NetEye Conference 2010: Ethan Galstad on Nagios
Würth Phoenix
 
Nagios Conference 2011 - Dave Williams - Nagios In The Real World - The Datac...
Nagios
 
Nagios 3
rajni_kant
 
Monitoring at/with SUSE 2015
Lars Vogdt
 
Nagios Conference 2011 - Mike Guthrie - Exploring Nagios Visualization Tools
Nagios
 
Business Service Monitoring Challenges in the Cloud Era
Rodrigue Chakode
 
BsidesMCR_2016-what-can-infosec-learn-from-devops
James '​-- Mckinlay
 
Nagios Conference 2013 - Rodrigue Chakode - Effective Monitoring for Demanding
Nagios
 
Nagios Conference 2012 - Kishore Jalleda - Nagios in the Agile DevOps Continu...
Nagios
 
Nagios XI Best Practices
Nagios
 
Ikon Managed Services
Cyril Simonnet
 
Ad

More from Nagios (20)

PDF
Jesse Olson - Nagios Log Server Architecture Overview
Nagios
 
PDF
Trevor McDonald - Nagios XI Under The Hood
Nagios
 
PDF
Sean Falzon - Nagios - Resilient Notifications
Nagios
 
PDF
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Nagios
 
PDF
Janice Singh - Writing Custom Nagios Plugins
Nagios
 
PDF
Dave Williams - Nagios Log Server - Practical Experience
Nagios
 
PDF
Mike Weber - Nagios and Group Deployment of Service Checks
Nagios
 
PDF
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Nagios
 
PDF
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Nagios
 
PDF
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Nagios
 
PDF
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Nagios
 
PDF
Eric Loyd - Fractal Nagios
Nagios
 
PDF
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Nagios
 
PDF
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Nagios
 
PPTX
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios
 
PDF
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios
 
PDF
Nagios Log Server - Features
Nagios
 
PDF
Nagios Network Analyzer - Features
Nagios
 
PPTX
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios
 
ODP
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios
 
Jesse Olson - Nagios Log Server Architecture Overview
Nagios
 
Trevor McDonald - Nagios XI Under The Hood
Nagios
 
Sean Falzon - Nagios - Resilient Notifications
Nagios
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Nagios
 
Janice Singh - Writing Custom Nagios Plugins
Nagios
 
Dave Williams - Nagios Log Server - Practical Experience
Nagios
 
Mike Weber - Nagios and Group Deployment of Service Checks
Nagios
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Nagios
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Nagios
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Nagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Nagios
 
Eric Loyd - Fractal Nagios
Nagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Nagios
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Nagios
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios
 
Nagios Log Server - Features
Nagios
 
Nagios Network Analyzer - Features
Nagios
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios
 
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios
 
Ad

Nagios Conference 2014 - Simon Finch - Monitoring Maturity A 16 Year Journey

  • 1. Monitoring Maturity A 16 Year Journey and (some of) the Lessons Learned Simon Finch NOC Monitoring & Event Manager [email protected] https://www.linkedin.com/pub/simon-finch/23/461/3b7
  • 2. Westpac Stats • Australia’s first and oldest bank (1817) • World’s Ninth largest bank • Global100 world’s most sustainable company (2014) • World’s most sustainable bank 2014 (Dow Jones Sustainability Indices) • Named as one of the World's Most Ethical Companies from 2008 - 2013 by the Ethisphere Institute
  • 3. Westpac Stats • 12 Million Customers • 570,000 Share Holders • 37,000 Employee’s • 1300 Australian points of representation • Offices in London, New York, Hong Kong, Singapore, India, Shanghai, Beijing, New Zealand, Pacific Islands & Indonesia
  • 4. About This Presentation 1. Monitoring Landscape 2. NOC Dash Boards 3. Service Desk Integration 4. Mainframe Alerts via MQSI
  • 5. Part 1 Monitoring Landscape
  • 6. Monitoring Landscape • Westpac – multi branded • IT is both insourced & outsourced • Nagios used extensively throughout insourced brands (SGB, BSA, BoM)
  • 7. Time Line • 1998 - Vendor Framework Installed • 2004 - First Nagios used to fill gaps • 2006 - Nagios replaced proprietary monitoring framework
  • 8. The Result ? • Paradigm shift (power to the people) • Deeper monitoring penetration • Agents part of server build
  • 9. Monitoring Today • Critical Applications • Branch Locations • Wintel, Linux, Unix • 4000 hosts • 37000 services
  • 10. Part 2 Dash Boards
  • 12. NOC Dash Boards • Major App status at a glance • Bright & colourful • Simple & effective • Drill down to application map • Time stamped • All done with NagVis
  • 13. Dash Board Evolution Early example
  • 14. Dash Board Evolution Then we tried
  • 17. VMware Dash Boards • VMware - interesting monitoring challenges • Metrics from each ESX host are monitored • Metrics are clustered to prevent false positives • > 20% Failure shown as a warning • > 40% Failure shown as critical
  • 19. VMware Dash Boards • Top level status shows nothing is wrong • VMware clusters have lots of redundancy • Hover display shows some detail • Although it looks simple, there are more than 150 metrics collected and clustered to build the status of each VMware ESX cluster.
  • 21. VMware Dash Boards Drill down shows the details for the support teams.
  • 23. Application Support • Specific Custom Requirements • Displayed on large monitors • Used for day to day operational status • Summary rolled up to the NOC
  • 27. NagVis & Livestatus • NagVis is extremely flexible • More so with Livestatus • Use simple perl script to extract data • Enhance dash boards anyway you want
  • 28. NagVis & Livestatus #!/usr/bin/perl -wT use CGI qw/:standard/; use Monitoring::Livestatus; my $q = new CGI; print $q->header(); $backend = $q->param( 'backend' ); $filtergrp = $q->param( 'hostGroup' ); $ml = Monitoring::Livestatus->new( server => ‘backend hostname:port' ); my $up = $ml->selectscalar_value("GET hostsnFilter: host_groups >= $filtergrpnStats: state = 0"); my $down = $ml->selectscalar_value("GET hostsnFilter: host_groups >= $filtergrpnStats: state = 1"); my $unknown = $ml->selectscalar_value("GET hostsnFilter: host_groups >= $filtergrpnStats: state = 2"); my $total = $up + $down + $unknown; print "$down/$total<br>";
  • 30. Summary • Simple is effective • Clear status summary • Bold broad use of colour • Details only for Specialists
  • 31. Part 3 Service Desk Integration
  • 32. Service Desk Integration Two tried and proven methods: 1. Direct to API 2. Event Management
  • 33. Service Desk Integration Direct to API
  • 34. Service Desk Integration Event Management
  • 35. Service Desk Integration • Notifications: email, sms, remedy • Custom macro in top level templates • Common script for host & Service • Custom macro sent to notify script • Email & sms easy to handle
  • 36. Service Desk Integration • Service desk ticket creation handled by sending event to BMC BEM. • Same method as email & SMS. • Different handler subroutine in notify script.
  • 37. Service Desk Integration • Nagios ContactGroup as ResolverGrp • Worked fine until department mergers changed ContactGroup names • Now using custom macro in top level templates
  • 38. What does that give us ? • Control of notification down to individual host & service level • Ability to set notification at template level and override at service / host level • Email / sms / tickets in any combination • Complete flexibility
  • 39. Summary • API – more control • API – much harder to do • EvtMgmt – not as much control • EvtMgmt – very simple to do
  • 41. Mainframe Alerts via MQSI • Why ? (you want to do what ????) • Why bother ? (That’s what Ops are for, aren't they ?) • How ? (it’s easier than it sounds)
  • 42. Mainframe Alerts - Why ? • Most of our distributed apps are back ended by mainframes • Mainframe hiccups cause apps to fail • Dash boards all stay a lovely shade of Green • MF vendor options outrageously expensive ($100k - $1M)
  • 43. Mainframe Alerts - How ? • Anybody who has legacy MF Apps also has MQSI (If not, why not ?) • SysProgs already logging errors • Setup MQSI channel from MF to Intel host • Agree on simple delimited message format • field1|field2|field3|field4|field5 …fieldN • SysProgs send error messages via MQSI
  • 44. Mainframe Alerts - How ? • Message arrives, MQSI auto runs custom script • Simple perl split on delimiter to recover fields. • Perl script then formats alert into any format you want and sends it to Nagios • NSCA, NDRP, SNMP, NagEventLog or what ever option you have available.
  • 46. Mainframe Alerts - Result ? • MF App events now in Nagios • Map MF events to appropriate Apps • Dashboards show correct status • Happy NOC Manager
  • 47. Summary • MF’s are still with us. • Process critical transactions. • Status not complete without MF app detail.
  • 49. Thanks for coming Simon Finch [email protected] https://www.linkedin.com/pub/simon-finch/23/461/3b7

Editor's Notes

  • #3: Speaker Notes:
  • #5: Speaker Notes:
  • #6: Speaker Notes:
  • #7: Speaker Notes: Lots of other monitoring systems in place. Including monitoring agents in server builds makes life so much easier.
  • #8: Speaker Notes: Gaps – lack of monitoring of network based services, DNS, NTP, DHCP, etc, caused apps to fail due to missing dependencies. Vendor solution to gaps in their product was to use (and pay for) more of their product, I tried Nagios instead. 2 years after the initial Nagios instance was deployed and seeing the benefits first hand, I went out on a limb and personally made the decision to deploy Nagios monitoring throughout the enterprise. Nagios implementation was very successful The head of IT production support at the time, was so happy with the results that we were rapidly achieving that my KPI's for the year were all changed mid year from various monitoring projects to a single entry, "Remove vendor framework and replace with Nagios". I was a member of the original team of three, that deployed the vendor monitoring 16 years ago in 1998 I have a unique insiders view of the entire monitoring journey and in depth knowledge of the successes and failures. We were successful and saved the company many millions of dollars in annual licensing fees.
  • #9: Speaker Notes: Paradigm shift in monitoring, Nagios so easy to use that systems admins now in charge of their own monitoring, previously not possible due to complexity. In six months we achieved deeper monitoring penetration that in the previous 8 years No more us and them or turf wars, sysadmins own it, so they embraced it. App Support teams know their apps better than the monitoring team ever will, we gave them the tools to monitor what they knew was important to them. I provide the monitoring system, they provide the app knowledge. Including monitoring agents in server builds makes life so much easier later on, infrastructure teams built the agents into their build process because they wanted them Internal customers are pleasantly surprised when I show up to project meeting and say “sure, I’ll have it done by next week” instead of the usual 3 months + Occasionally I am able to attend the first meeting with a sample dashboard of their app.
  • #10: Speaker Notes: All supported by: 16 Nagios servers, 1 engineer and lots of support from SysAdmin colleagues.
  • #11: Speaker Notes:
  • #12: Speaker Notes:
  • #13: Speaker Notes: Prior to Nagios, we had no usable dash boards. Management had no idea what was happening (neither did we) Proprietary dash boarding software is extremely expensive. Time Stamped as break in comms to webserver can cause browser to stop updating.
  • #14: Speaker Notes: To busy, to much detail, top level status summary is usually red.
  • #15: Speaker Notes: Not much better
  • #16: Speaker Notes: Clean, simple, easy to read, application status is instant.
  • #17: Speaker Notes: Simple information is also inferred by the position on the dash board left hand side icons are for servers located at primary data centre, right hand side for secondary data centre.
  • #18: Speaker Notes: VMware clusters present some interesting monitoring challenges Metrics from each ESX host are monitored Metrics are clustered to prevent false positives More than 20% failure is shown as a warning, > 40% is shown as critical
  • #19: Speaker Notes:
  • #20: Speaker Notes: There are some minor issues even though the top level show nothing is wrong. Looks simple, there are more than 150 metrics collected and clustered to build the status of each VMware ESX cluster. The Dashboard hides the complexity to give a simple clear indication of the application status
  • #21: Speaker Notes: Closer inspection shows minor issues using hover
  • #22: Speaker Notes:
  • #23: Speaker Notes: Lower level dash boards show the detail for those who really need it
  • #24: Speaker Notes: Application support teams have specific requirements Specialty dash boards are displayed on large monitors in the support team areas These are used for day to day operational status at a detailed level The summary is rolled up to the NOC summary dash board
  • #25: Speaker Notes: These are used for day to day operational status at a detailed level The summary is rolled up to the NOC summary dash board
  • #26: Speaker Notes: Email Archive Manager warning summary is rolled up to the summary dash board for management by the NOC
  • #27: Speaker Notes:
  • #28: Speaker Notes: NagVis is super configurable, anything we can dream up can be dash boarded easily. Custom object graphics are created in graphics editor / publisher / power point and exported as png files.
  • #29: Speaker Notes: Live perl code that I use to extra data from a Livestatus backend, only the hostname and port have been changed
  • #30: Speaker Notes: This is a Branch Status GeoMap that has been exported as a static map Then enhanced with a couple of lines of perl that extra data from a Livestatus backend. The perl code from the previous slide provides the values for the status boxes. A separate cluster monitor provides the status colour for the boxes Branches for each geographical state belong to a hostgroup The cluster monitor is set to !Name!2!5!$HOSTSTATEID NOC monitoring team is in the unique position of showing all sites from all suppliers on one pane of glass, none of our service providers can do that.
  • #31: Speaker Notes:
  • #32: Speaker Notes:
  • #33: Speaker Notes: I have done it both ways, with two different monitoring systems and two different service desks, the methods are universal Direct to API is hard work but gives fine grain control and extremely tight coupling – lots of coding Going via event management is very easy and is loosely coupled – very simple coding in the notification script.
  • #34: Speaker Notes: Difficult but gives a lot more control. A lot of code has to be written and maintained. We did this and had very tightly coupled monitoring to SD integration, but nobody really cared, so long as the ticket gets logged Maintained the whole thing for years.
  • #35: Speaker Notes: Now doing it this way. Much easier, construct event in notification script, fire and forget.
  • #36: Speaker Notes:
  • #37: Speaker Notes: BEM – BMC Event Management
  • #38: Speaker Notes: Custom macro contains Resolver Group for Remedy.
  • #39: Speaker Notes:
  • #40: Speaker Notes:
  • #41: Speaker Notes: Methods presented here are applicable to any legacy style system that has a logging mechanism and IP connectivity.
  • #42: Speaker Notes:
  • #43: Speaker Notes: IT pundits have been predicting the death of the mainframe for years, someone forgot to tell the mainframe guys. Instead of dying they have morphed into super web application servers. After a recent upgrade, we kept the old MF as a OS390 WebSphere App server, these dinosaurs aren’t retiring anytime soon.
  • #44: Speaker Notes:
  • #45: Speaker Notes:
  • #46: Speaker Notes: This method is flexible Originally built to integrate into proprietary event management system. With simple code changes to output method, now feeds into Nagios.
  • #47: Speaker Notes:
  • #48: Speaker Notes: