© 2017 SPLUNK INC.© 2017 SPLUNK INC.
Tom Harrop
IT Operations Specialist
David Millis
ITOA Architect
The Hitchhiker’s Guide to Service
Intelligence
© 2017 SPLUNK INC.
▶ Introductions and Set Up
▶ Splundamentals – IT Troubleshooting With Splunk
▶ What Is IT Service Intelligence?
▶ Service Intelligence Design Practices
▶ Let's Play!
▶ What's Next?
▶ Happy Hour
Agenda
© 2017 SPLUNK INC.
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward-looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Forward-Looking Statements
© 2017 SPLUNK INC.
1. Build on what you are already
doing with Splunk
2. Service intelligence design and
configuration practices
3. What’s possible with
Splunk IT Service Intelligence
Key
Takeaways
© 2017 SPLUNK INC.
Splundamentals – IT
Troubleshooting With Splunk
© 2017 SPLUNK INC.
Challenging Traditional Methods
METRICS
EVENTS
Storage
Server
Network
INFRASTRUCTURE LAYER
Aggregation/Correlation/Visualization
SERVICE LAYER
BUSINESS LAYER
Service Model
Definition &
Correlation Engine
Challenges
▶ Too many disparate
components
▶ Difficult to define
service model
▶ Labor intensive
▶ Most implementations fail
Very important source is
missing! (machine data)
Byte Code
Instrumentation
Synthetic APM
Adaptive
Thresholding
APPLICATION LAYER
74
%
-
36
%
© 2017 SPLUNK INC.
METRICS
EVENTS
MACHINE
DATA
Challenging Traditional Methods
Storage
Utilization, Capacity,
Performance
Server
Performance, Usage,
Dependency
Network
Packet, Payload,
Traffic, Utilization, Perf
INFRASTRUCTURE LAYER
Splunk is the missing link
▶ Data Fidelity
▶ Single Repository for ALL data
▶ Easier to Manage Services
▶ Reduced Integrations
▶ Reduced Point Solutions
▶ Collaborative Approach
▶ Quick Time-to-Value
Byte Code
Instrumentation
Usage, Experience,
Performance, Quality
Synthetic APM
Availability, Capacity,
User Experience
Adaptive
Thresholding
Apps, Services, Systems
APPLICATION LAYER
7
4
%
-
3
6
%
Machine Data Fabric Platform
Service Intelligence
© 2017 SPLUNK INC.
Splunk Approach to Machine Data
SQL Search
Schema at Write Schema at Read
Traditional Splunk
ETL Universal Indexing
Volume Velocity Variety
UnstructuredStructured
RDBMS
▶ Define static schema
▶ ETL into Schema
▶ Enrich at write
▶ New data = new columns
▶ New questions = new columns
▶ “Data at rest” (delayed info)
▶ Labor Intensive & time consuming
Ideal for Reporting
▶ “Schema on-the-Fly”
▶ Data in native format
▶ Enrich at read
▶ New data = no changes needed
▶ New questions = no changes needed
▶ “Data in motion” (real time)
▶ Fast time-to-value
Ideal for Investigation
© 2017 SPLUNK INC.
Listen to your data
Let’s take a closer look at IT troubleshooting with Splunk
© 2017 SPLUNK INC.
Machine learning-powered analytics for real-time service insights,
simplified operations and root-cause isolation
© 2017 SPLUNK INC.
What Is Service Intelligence?
Enabling a business-aware IT
Measuring and reporting on indicators that matter
Unlocking operational efficiencies
Collaborating across silos to improve service operations
Data-based decision making
Solving problems and anticipating pitfalls with sophisticated
analytics and powerful insights
© 2017 SPLUNK INC.
Splunk IT Service Intelligence
Machine Learning
 Adaptive threshold automation to minimize false alerts
 Behavior anomaly alerts to proactively address issues
 Correlating data into knowledge, mitigating SME dependency
 Accelerators minimize SPL coding
 Trend aggregation to enable rapid visualization
 Multi KPI Alerts for proactive irregularity identification
Search-Based KPIs
 Time Series Index
 Schema on Read
 Data Models
Platform for Operational Intelligence
 Visualize entire tech stack – bare metal through business layer
 View the entire ecosystem with customized views for execs
 Use 3 clicks to get the answer vs. 10
Dynamic Service ModelSplunk ITSI
Capabilities
© 2017 SPLUNK INC.
The Possibilities for Business…
© 2017 SPLUNK INC.
© 2017 SPLUNK INC.
© 2017 SPLUNK INC.
The Possibilities for IT Operations…
Service
Health
© 2017 SPLUNK INC.
What is a Service?
In Splunk ITSI, a service is a logical group of technology
components that a user deems need to be monitored together.
It can often be generalized as a “black box” in which we send
requests and expect responses.
Requests Responses
Service
© 2017 SPLUNK INC.
What is a Service?
Services can be lower level (technical)
DNS
Auth
Web
Requests Responses
© 2017 SPLUNK INC.
What is a Service?
Services can also be higher level (business) …
DNS
Auth
Web
Requests Responses
Customer
Transactions
Support Desk
Requests Responses
Technical Business
© 2017 SPLUNK INC.
What is a Service?
RDBMSs
Hypervisor and Hosts
API Services
Packet Network
Storage Tiers
Web Services
CustomerTransactions
MobileAPI/
Middleware
PartnerPortal
DNS
Services can encompass multiple tiers of the IT domain.
They may also depend upon other services.
© 2017 SPLUNK INC.
What’s a KPI
DNS Customer
Transactions
Business
Functions
KPI: Request volume
KPI: Error rate
KPI: Average response time
KPI: Server CPU load
KPI: Configuration changes
KPI: Transaction volume
KPI: Error rate
KPI: Average response time
KPI: Max response time
KPI: Count of Change records
KPI: Business volume
KPI: Error rate
KPI: Revenue rate
KPI: Conversion rate
KPI: Count of incident tickets
KPIs and health scores constitute the means
by which Services are monitored.
© 2017 SPLUNK INC.
A key performance indicator (KPI) is a Splunk saved search
created within the Splunk ITSI UI that helps monitor a specific field
like CPU, Memory, Number of Errors and so on.
KPIs are contained within services.
What’s a Key Performance Indicator?
© 2017 SPLUNK INC.
Service Health Scores
A health score is a score from 0-100 (0 = critical and 100 = normal)
that helps determine the health of a service.
It is calculated based on importance and status
(e.g., green, orange, red) of all KPIs, once every minute.
© 2017 SPLUNK INC.
Splunk IT Service Intelligence
Let’s take a closer look at Service Intelligence with Splunk
© 2017 SPLUNK INC.
Service Intelligence Design
Practices
© 2017 SPLUNK INC.
Bring subject
experts together
Design before
configuring
Best Practices for Service Intelligence
Start with
a problem worth
solving
1 2 3
© 2017 SPLUNK INC.
Best Practices for Service Intelligence
Define services,
entities and KPIs
Monitor and
troubleshoot
Analyze
and detect
Data-Defined, Data-Driven Service Insights
© 2017 SPLUNK INC.
The Business Problem for Buttercup Games
Supply Chain
Limited Visibility
Online
Store
Poor Customer
Satisfaction
ERP
Frequent
Bottlenecks
Online Store
Poor Customer
Satisfaction
Failed
Interactions
War Room
Escalations
Business
Impact
$48K / Week
Lost Revenue
$
© 2017 SPLUNK INC.
Service Decomposition
Business Layer
Mail Transport - Order Processing
E-Commerce - Financials
Service Layer Business Service
Application Layer
Middleware – Application
Server - Database
Custom Apps
Infrastructure Layer
Power / Cooling / Facilities
Server – Networking – Storage
© 2017 SPLUNK INC.
Typical Data Sources
Order Entry Manufacturing Shipping Fulfillment
Online Store EDI
Web Tier Middleware
Service Layer
Business Layer
Application Layer
Infrastructure Layer
Supply Chain
• Total Orders
• Total Revenue
• Unit Count
• Unit Failures
• Service
Level
• Delivery
Time
• Online Orders
• Online Revenue
• Response Time
• Service Health
• Incidents/Changes
• Customer Satisfaction
• HTTP Hits
• Error Rate
• CPU Load
• Memory Used
• Disk Used
• IO Latency
• CPU Load
• Memory Used
• Disk Used
• IO Latency
• Response
Time
• Error Rate
• Response Time
• Storage Free
© 2017 SPLUNK INC.
▶ High-value business services
• Buttercup Games Online Store and Supply Chain
▶ Supporting services
• Web, Middleware, Database
▶ Relevant KPIs for each service
• Database:, errors, SQL hits, …)
▶ Splunk search for each KPI
• (index=DB (warn* OR error*) | stats count)
Service Intelligence Design in Splunk ITSI
© 2017 SPLUNK INC.
Service Intelligence Design – Buttercup Games
Order Entry Manufacturing Shipping Fulfillment
Online Store EDI
Web Tier Middleware
Service Layer
Business Layer
Application Layer
Infrastructure Layer
• Application Logs
• Corporate Databases
• Service Management
• Application Logs
• Webserver Logs
• DB Perf Counters
• Wire data
• Perf Counters
• Access Logs
• Network Logs
Supply Chain
© 2017 SPLUNK INC.
Let’s Play!
Setting up Service Intelligence
© 2017 SPLUNK INC.
Service Visibility in Splunk ITSI
Click
Glass Tables
© 2017 SPLUNK INC.
Service Visibility in Splunk ITSI
Click
(open in new tab)
Buttercup Games
Business Process
(IN PROGRESS)
© 2017 SPLUNK INC.
Service Visibility in Splunk ITSI
Click
(open in new tab)
Buttercup Games
Online Store
© 2017 SPLUNK INC.
Goal 1: Supply Chain Visibility
© 2017 SPLUNK INC.
Goal 2: Online Store Process Flow
© 2017 SPLUNK INC.
▶ Create a new KPI for the DB Service:
• Network Utilization
▶ Modify the Executive Glass Table
in order to show off the services
you slave over
New Requirements!
“We only have about 15min
TO DO WHAT ???!!???”
Think about how
long this would take
you today?
© 2017 SPLUNK INC.
Configuration of DB Service
Click
Configure > Services
© 2017 SPLUNK INC.
Let’s Talk Entities
▶ Entities are the relevant things which
support this service (usually hosts)
▶ Select the right entries with filters,
ANDs, ORs
▶ Original Entity list can come from CMDB,
spreadsheet, Splunk search, others
Click
DB Service
© 2017 SPLUNK INC.
▶ Click New – Generic KPI
A KPI in 5 minutes? Absolutely!
Call it Network Utilization,
with your username up
front
▶ Select Data Model
• Host Operating System
• Network
• # bytes
• Next
© 2017 SPLUNK INC.
▶ Select Yes for Split by & Filter options
▶ Select host for Entity Lookup &
Alias options
▶ Click Next
KPIs Continued….
Splunk Builds Searches for you –
Oh Yeah, that’s happening 
© 2017 SPLUNK INC.
▶ KPI Search Schedule: Every Minute
▶ Entity Calculation: Average
▶ Service/Agg Calculation: Average
▶ Calculation Window: Last Minute
▶ Click Next
Almost There…
Unit: Bps
Click Next
© 2017 SPLUNK INC.
Final Steps…
▶ Set your thresholds:
• Aggregate (All)
• Per Entity
▶ Click Add Threshold TWICE
▶ Make the Neapolitan ice cream
colors Yellow, Green, Yellow
▶ Drag the sliders around in order to
get the current data graph entirely
inside the Green (normal) band
▶ Click Finish
▶ Other options are also available,
including adaptive thresholds and
anomaly detection
© 2017 SPLUNK INC.
▶ What if your KPI data looks like this?
Adaptive Thresholds
© 2017 SPLUNK INC.
▶ Static thresholds will not work…
Adaptive Thresholds
© 2017 SPLUNK INC.
▶ Adaptive thresholding works beautifully with cyclical (and other dynamic) data
Adaptive Thresholds
© 2017 SPLUNK INC.
Anomaly Detection
▶ Machine learning
▶ “Trending” detects deviations for
aggregate KPI based on
historical trends
▶ “Entity cohesion” detects entities
which deviate from “pack”
behavior
© 2017 SPLUNK INC.
Let’s Fix That Glass Table
© 2017 SPLUNK INC.
▶ Return to Saved Glass Tables page
(click on Glass Tables in the upper menu bar)
▶ Click Edit for “Buttercup Games Business
Process (IN PROGRESS)”
• Select Clone
• Title: Add your username
to the front
• Permissions: Shared in App
• Click Clone Page
• Click on your new Glass Table
from the list, to view it
Clone the Glass Table
© 2017 SPLUNK INC.
▶ Click on Edit in the upper right corner of your
Glass Table
▶ Use the “Services” panel on the left to select
Individual KPIs, or Aggregate Service
Health Scores
▶ Choose 2 KPIs from Online Store that would
be useful in the “Order Process” section
▶ Drag the selected widgets onto the canvas,
positioning in the gray oval
▶ What’s the difference between the
and tools at the top left?
Edit & Have Fun!
© 2017 SPLUNK INC.
More Fun with the Glass Table Editor…
Use the Configurations panel on the right
to edit a selected widget
▶ Can change the visualization type,
drilldown behavior and other settings
▶ You should hit Save frequently
▶ Revert All Changes can be helpful,
occasionally
© 2017 SPLUNK INC.
▶ Add a ServiceHealthScore widget for
Online Store under Buttercup
▶ Choose a Viz Type with a sparkline
graph, then resize to make it look pretty
▶ Modify the Custom Drilldown action
to go to the saved glass table,
Buttercup Games Online Store
▶ Bonus Points: Make the label bigger,
more readable
▶ Click Save
▶ View when done
Finishing Up …
© 2017 SPLUNK INC.
Let’s Play!
A Troubleshooting Exercise
© 2017 SPLUNK INC.
▶ Let’s use Splunk ITSI to troubleshoot an outage
▶ Start at your Glass Table, “<UserName> Buttercup Business Process”
▶ Customer Care reports that unhappy customers are complaining of failures
and long delays when trying to purchase
▶ The calls began coming in at around the top of the last hour.
▶ In the upper right corner of the Glass Table, change
the time picker from Now to XX:00:00.0, where XX
is the previous hour. For example, if it is currently 14:05,
set the time picker to 13:00:00.0, then Apply
▶ This is how we can “time travel” back to see conditions
at a particular outage– oh yeah!
A Troubleshooting Exercise
© 2017 SPLUNK INC.
▶ The Online Store seems
to be degraded, just as
Customer Care reported.
Click on the widget under
Buttercup to drill down
further
A Troubleshooting Exercise, cont’d
© 2017 SPLUNK INC.
▶ The Online Store Glass Table shows
a much more detailed view, including
the impacted customer-facing KPIs
at the far left (Revenue, etc)
▶ Based on this view of all the relevant
services, where do you think the root
cause lies?
▶ Which service should we
troubleshoot first?
▶ Click on Health widget for that
service, to drill down to a Deep Dive
A Troubleshooting Exercise, cont’d.
© 2017 SPLUNK INC.
▶ Deep Dive shows multiple KPIs and
health scores in parallel “swim lanes”.
▶ The health score for this service is the
top swim lane.
• Can you see when it begins to
degrade from 100%?
▶ Mousing over this point in time, can you
spot the KPI with the leading fault
indication, i.e., what failed first?
▶ To improve readability, make sure the
Primary Time Range (upper right
corner) is set to Relative > Earliest: 2
Hours Ago
Deep Dive
© 2017 SPLUNK INC.
▶ Click on Notable Events
Review
▶ Multiple KPIs and health scores
can be combined in
sophisticated ways to create
Multi-KPI alerts
▶ When a Multi-KPI alert fires,
one of the outcomes is the
creation of a Notable Event
▶ Notable Events allow NOC
personnel and others to triage
and coordinate event
management efforts
Multi-KPI Alerts and Notable Events
© 2017 SPLUNK INC.
▶ Click on Service Analyzer >
Default Service Analyzer
Back where we started!
▶ This view shows a “no-frills”
list of services (top) and
hottest KPIs (bottom)
▶ Provides access into Service
Details
▶ It is useful for NOCs and
others who need a high-level
situational view
Service Analyzer
© 2017 SPLUNK INC.
Let’s Play!
Advanced Exercises
© 2017 SPLUNK INC.
▶ High-value services can be decomposed and modeled in Splunk ITSI,
using machine data from the relevant systems
▶ Services and KPIs can be created in minutes, with sophisticated thresholding
techniques to distinguish “normal” from “not normal”
▶ Glass Tables allow service health and KPI metrics to be displayed in a way that
makes sense to specific groups, such as Executive Leadership, Business Service
Owners, the NOC, DevOps & Others
▶ Deep Dives allow KPIs to be compared side-by-side across any time range,
accelerating root cause analysis and significantly reducing MTTR
▶ Multi-KPI Alerts and Notable Events reduce alert noise, producing actionable
events and a means to manage them
▶ …and it’s fast + fun to build!
Summary
© 2017 SPLUNK INC.
What Splunk ITSI
Customers Are Doing
© 2017 SPLUNK INC.
Splunk IT Service Intelligence
Machine Learning-Powered, Analytics-Driven IT Operations
Prioritize incidents
with context
Deliver business &
service context to
prioritize incident
investigation & action
Redefine the
role of IT
Support decisions &
communicate results
with powerful
service-level insights
Simplify service
operations
Leverage machine learning
to detect anomalies &
highlight events that matter
Unify siloed
monitoring
Combine events & metrics
across silos with ease,
flexibility & scale in days
© 2017 SPLUNK INC.
All the scores are time based KPIs
or nested sub processes that are
searching in real time for some relevant
condition of interest.
These are Health Scores – a high level aggregation of the health of the underlying processes.
All the scores are color coded to convey if
they are “normal” or “abnormal” based on
your criteria OR Splunk’s Packaged Machine
Learning, enabled with an ON/OFF switch.
This shows how ‘Glass
Tables’ can visualize
key performance
indicators and health
scores that combine
data from diverse
sources.
This example is an
abbreviated ‘Book to
Bill’, or sometimes
called ‘Order to Cash’
business process.
Splunk’s Solution:
A lens could be multiple processes…
© 2017 SPLUNK INC.
Call Center Service
Service Health Transactions
ACD Analysis – Core Splunk
Call Wait History
Inbound Analysis
Social Media
Online Msg
Social Media
Mail SupportVOIP Service
Inbound Calls
© 2017 SPLUNK INC.
Online Transactions
Internal Transfer Service
External Wire Service
Money Exchange Service
Money Transfer Services
Service Health Corporate
Reconciliation Service
Fed Exchange Service
Core Splunk Searches
Transaction History
System Investigation
Heat Map Analysis
© 2017 SPLUNK INC.
CIO Scorecard
Enterprise Service Status Major Incidents
Service Health
Continuous Operational Visibility
Volume Revenue Incidents Changes
Major Changes
Service Health Volume Revenue Incidents Changes
Service Health Volume Ontime
Delivery
Incidents Changes Service Health VolumeRevenue Incidents Changes
Service Health Volume Revenue Incidents Changes Container UtilService Health Throughput Incidents Changes
© 2017 SPLUNK INC.
What is it?
▶ 1-day on-site workshop
▶ Tightly linked with value
▶ Collaborative approach
▶ Build your own Splunk ITSI
Glass Table
Sign Up Now – We’re Here to Help
Define methods for:
▶ Proactive service
monitoring
▶ Reduced risk and failures
▶ Faster issue resolution
▶ Increased business
performance
Harness the creativity and domain knowledge of your organization to
unlock the value of data and solve an important Business Service problem
through a joint service intelligence workshop with key stakeholders
© 2017 SPLUNK INC.
Our Workshop In Action
© 2017 SPLUNK INC.
Splunk Quick Start for Service Intelligencee
Enterprise
License
Splunk ITSI
License
Education Professional
Services
.conf
Passes
Value
Assurance
Edition
Services
Edition
Platform
Edition
* Splunk ITSI 6-month license
*
© 2017 SPLUNK INC.
Bring your subject
experts together
Conduct a
Service Intelligence
workshop
Your Mission, Should You Choose to Accept It…
Find a problem
worth solving
in your enterprise
1 2 3
© 2017 SPLUNK INC.
▶ Splunk ITSI Sandbox Guide: (An app on your Splunk ITSI instance)
▶ ITSI Documentation: http://docs.splunk.com/Documentation/ITSI
For More Info…
© 2017 SPLUNK INC.© 2017 SPLUNK INC.
Thank You

More Related Content

PDF
CICD Pipelines for Microservices Best Practices
PDF
InfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
PDF
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
PPTX
__Cloud_CNA_MSA_Service+Data+InferenceMesh 소개-박문기@메가존클라우드-20230320.pptx
PPTX
NSX-T Architecture and Components.pptx
PDF
WebAuthn and Security Keys
PDF
Introduction to Red Hat OpenShift 4
PDF
Gitlab CI : Integration et Déploiement Continue
CICD Pipelines for Microservices Best Practices
InfluxDB + Telegraf Operator: Easy Kubernetes Monitoring
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
__Cloud_CNA_MSA_Service+Data+InferenceMesh 소개-박문기@메가존클라우드-20230320.pptx
NSX-T Architecture and Components.pptx
WebAuthn and Security Keys
Introduction to Red Hat OpenShift 4
Gitlab CI : Integration et Déploiement Continue

What's hot (20)

PDF
Funny stories and anti-patterns from DevOps landscape
PDF
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
PPTX
Oracle apex training
PPTX
Best practices for implementing CI/CD on Salesforce
PDF
Microservices & API Gateways
PDF
Metrics, Risk Management & DLP
PDF
Customer migration to azure sql database from on-premises SQL, for a SaaS app...
PPTX
The Elastic Stack as a SIEM
PDF
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
PDF
Gitops: a new paradigm for software defined operations
PPTX
Meetup 23 - 03 - Application Delivery on K8S with GitOps
PDF
Securing Prometheus exporters using HashiCorp Vault
PDF
MySQL Community and Commercial Edition
PPTX
CI/CD trên Cloud OpenStack tại Viettel Networks | Hà Minh Công, Phạm Tường Chiến
PPTX
Introduction to openshift
PPTX
Splunk for Enterprise Security and User Behavior Analytics
PDF
Open Banking via API Connect & DataPower
PDF
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...
PPTX
.Net platform .Net core fundamentals
Funny stories and anti-patterns from DevOps landscape
Kubernetes 101 - an Introduction to Containers, Kubernetes, and OpenShift
Oracle apex training
Best practices for implementing CI/CD on Salesforce
Microservices & API Gateways
Metrics, Risk Management & DLP
Customer migration to azure sql database from on-premises SQL, for a SaaS app...
The Elastic Stack as a SIEM
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
Gitops: a new paradigm for software defined operations
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Securing Prometheus exporters using HashiCorp Vault
MySQL Community and Commercial Edition
CI/CD trên Cloud OpenStack tại Viettel Networks | Hà Minh Công, Phạm Tường Chiến
Introduction to openshift
Splunk for Enterprise Security and User Behavior Analytics
Open Banking via API Connect & DataPower
Nagios Monitoring Tool Tutorial | Server Monitoring with Nagios | DevOps Trai...
.Net platform .Net core fundamentals
Ad

Similar to The Hitchhiker's Guide to Service Intelligence (20)

PPTX
Splunk Forum Frankfurt - 15th Nov 2017 - AI Ops
PPTX
SplunkLive! London 2017 - Splunk Enterprise for IT Troubleshooting
PPTX
SplunkLive! London 2017 - How to Earn a Seat and the Business Table with Splunk
PPTX
SplunkLive! London 2017 - Getting Started with Splunk IT Service Intelligence
PPTX
SplunkLive! Zurich 2017 - How to Design, Build and Map IT and Business Servic...
PPTX
Splunk Discovery: Milan 2018 - Delivering New Visibility and Analytics for IT...
PPTX
SplunkLive! London 2017 - Happy Apps, Happy Users
PDF
Splunk for AIOps: Reduce IT outages through prediction with machine learning
PPTX
Delivering New Visibility and Analytics for IT Operations
PPTX
Rage WITH the machine, not against it: Machine learning for Event Management
PDF
Reactive to Proactive: Intelligent Troubleshooting and Monitoring with Splunk
PPTX
Building Service Intelligence with Splunk IT Service Intelligence (ITSI)
PPTX
SplunkLive! Zurich 2017 - Advanced Analytics / Machine Learning
PPTX
Machine Learning für Event Management
PPTX
Hitchhikers Guide to Service Intelligence
PDF
SplunkLive! Amsterdam 2015 - IT Ops breakout
PPTX
Splunk Forum Frankfurt - 15th Nov 2017 - Machine Learning For Event Management
PDF
Splunk workshop-Service Intelligence
PDF
Hitchhikers Guide to Service Intelligence
PPTX
SplunkLive! London 2017 - DevOps Powered by Splunk
Splunk Forum Frankfurt - 15th Nov 2017 - AI Ops
SplunkLive! London 2017 - Splunk Enterprise for IT Troubleshooting
SplunkLive! London 2017 - How to Earn a Seat and the Business Table with Splunk
SplunkLive! London 2017 - Getting Started with Splunk IT Service Intelligence
SplunkLive! Zurich 2017 - How to Design, Build and Map IT and Business Servic...
Splunk Discovery: Milan 2018 - Delivering New Visibility and Analytics for IT...
SplunkLive! London 2017 - Happy Apps, Happy Users
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Delivering New Visibility and Analytics for IT Operations
Rage WITH the machine, not against it: Machine learning for Event Management
Reactive to Proactive: Intelligent Troubleshooting and Monitoring with Splunk
Building Service Intelligence with Splunk IT Service Intelligence (ITSI)
SplunkLive! Zurich 2017 - Advanced Analytics / Machine Learning
Machine Learning für Event Management
Hitchhikers Guide to Service Intelligence
SplunkLive! Amsterdam 2015 - IT Ops breakout
Splunk Forum Frankfurt - 15th Nov 2017 - Machine Learning For Event Management
Splunk workshop-Service Intelligence
Hitchhikers Guide to Service Intelligence
SplunkLive! London 2017 - DevOps Powered by Splunk
Ad

More from Splunk (20)

PDF
Splunk Leadership Forum Wien - 20.05.2025
PDF
Splunk Security Update | Public Sector Summit Germany 2025
PDF
Building Resilience with Energy Management for the Public Sector
PDF
IT-Lagebild: Observability for Resilience (SVA)
PDF
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
PDF
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
PDF
Praktische Erfahrungen mit dem Attack Analyser (gematik)
PDF
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
PDF
Security - Mit Sicherheit zum Erfolg (Telekom)
PDF
One Cisco - Splunk Public Sector Summit Germany April 2025
PDF
.conf Go 2023 - Data analysis as a routine
PDF
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
PDF
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
PDF
.conf Go 2023 - Raiffeisen Bank International
PDF
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
PDF
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
PDF
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
PDF
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
PDF
.conf go 2023 - De NOC a CSIRT (Cellnex)
PDF
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk Leadership Forum Wien - 20.05.2025
Splunk Security Update | Public Sector Summit Germany 2025
Building Resilience with Energy Management for the Public Sector
IT-Lagebild: Observability for Resilience (SVA)
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Security - Mit Sicherheit zum Erfolg (Telekom)
One Cisco - Splunk Public Sector Summit Germany April 2025
.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - De NOC a CSIRT (Cellnex)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)

Recently uploaded (20)

DOCX
search engine optimization ppt fir known well about this
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PPTX
Configure Apache Mutual Authentication
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
search engine optimization ppt fir known well about this
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
The influence of sentiment analysis in enhancing early warning system model f...
NewMind AI Weekly Chronicles – August ’25 Week III
4 layer Arch & Reference Arch of IoT.pdf
Improvisation in detection of pomegranate leaf disease using transfer learni...
Configure Apache Mutual Authentication
sustainability-14-14877-v2.pddhzftheheeeee
TEXTILE technology diploma scope and career opportunities
Flame analysis and combustion estimation using large language and vision assi...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Convolutional neural network based encoder-decoder for efficient real-time ob...
Early detection and classification of bone marrow changes in lumbar vertebrae...
Microsoft Excel 365/2024 Beginner's training
NewMind AI Weekly Chronicles – August ’25 Week IV
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Module 1 Introduction to Web Programming .pptx
OpenACC and Open Hackathons Monthly Highlights July 2025

The Hitchhiker's Guide to Service Intelligence

  • 1. © 2017 SPLUNK INC.© 2017 SPLUNK INC. Tom Harrop IT Operations Specialist David Millis ITOA Architect The Hitchhiker’s Guide to Service Intelligence
  • 2. © 2017 SPLUNK INC. ▶ Introductions and Set Up ▶ Splundamentals – IT Troubleshooting With Splunk ▶ What Is IT Service Intelligence? ▶ Service Intelligence Design Practices ▶ Let's Play! ▶ What's Next? ▶ Happy Hour Agenda
  • 3. © 2017 SPLUNK INC. During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved. Forward-Looking Statements
  • 4. © 2017 SPLUNK INC. 1. Build on what you are already doing with Splunk 2. Service intelligence design and configuration practices 3. What’s possible with Splunk IT Service Intelligence Key Takeaways
  • 5. © 2017 SPLUNK INC. Splundamentals – IT Troubleshooting With Splunk
  • 6. © 2017 SPLUNK INC. Challenging Traditional Methods METRICS EVENTS Storage Server Network INFRASTRUCTURE LAYER Aggregation/Correlation/Visualization SERVICE LAYER BUSINESS LAYER Service Model Definition & Correlation Engine Challenges ▶ Too many disparate components ▶ Difficult to define service model ▶ Labor intensive ▶ Most implementations fail Very important source is missing! (machine data) Byte Code Instrumentation Synthetic APM Adaptive Thresholding APPLICATION LAYER 74 % - 36 %
  • 7. © 2017 SPLUNK INC. METRICS EVENTS MACHINE DATA Challenging Traditional Methods Storage Utilization, Capacity, Performance Server Performance, Usage, Dependency Network Packet, Payload, Traffic, Utilization, Perf INFRASTRUCTURE LAYER Splunk is the missing link ▶ Data Fidelity ▶ Single Repository for ALL data ▶ Easier to Manage Services ▶ Reduced Integrations ▶ Reduced Point Solutions ▶ Collaborative Approach ▶ Quick Time-to-Value Byte Code Instrumentation Usage, Experience, Performance, Quality Synthetic APM Availability, Capacity, User Experience Adaptive Thresholding Apps, Services, Systems APPLICATION LAYER 7 4 % - 3 6 % Machine Data Fabric Platform Service Intelligence
  • 8. © 2017 SPLUNK INC. Splunk Approach to Machine Data SQL Search Schema at Write Schema at Read Traditional Splunk ETL Universal Indexing Volume Velocity Variety UnstructuredStructured RDBMS ▶ Define static schema ▶ ETL into Schema ▶ Enrich at write ▶ New data = new columns ▶ New questions = new columns ▶ “Data at rest” (delayed info) ▶ Labor Intensive & time consuming Ideal for Reporting ▶ “Schema on-the-Fly” ▶ Data in native format ▶ Enrich at read ▶ New data = no changes needed ▶ New questions = no changes needed ▶ “Data in motion” (real time) ▶ Fast time-to-value Ideal for Investigation
  • 9. © 2017 SPLUNK INC. Listen to your data Let’s take a closer look at IT troubleshooting with Splunk
  • 10. © 2017 SPLUNK INC. Machine learning-powered analytics for real-time service insights, simplified operations and root-cause isolation
  • 11. © 2017 SPLUNK INC. What Is Service Intelligence? Enabling a business-aware IT Measuring and reporting on indicators that matter Unlocking operational efficiencies Collaborating across silos to improve service operations Data-based decision making Solving problems and anticipating pitfalls with sophisticated analytics and powerful insights
  • 12. © 2017 SPLUNK INC. Splunk IT Service Intelligence Machine Learning  Adaptive threshold automation to minimize false alerts  Behavior anomaly alerts to proactively address issues  Correlating data into knowledge, mitigating SME dependency  Accelerators minimize SPL coding  Trend aggregation to enable rapid visualization  Multi KPI Alerts for proactive irregularity identification Search-Based KPIs  Time Series Index  Schema on Read  Data Models Platform for Operational Intelligence  Visualize entire tech stack – bare metal through business layer  View the entire ecosystem with customized views for execs  Use 3 clicks to get the answer vs. 10 Dynamic Service ModelSplunk ITSI Capabilities
  • 13. © 2017 SPLUNK INC. The Possibilities for Business…
  • 16. © 2017 SPLUNK INC. The Possibilities for IT Operations… Service Health
  • 17. © 2017 SPLUNK INC. What is a Service? In Splunk ITSI, a service is a logical group of technology components that a user deems need to be monitored together. It can often be generalized as a “black box” in which we send requests and expect responses. Requests Responses Service
  • 18. © 2017 SPLUNK INC. What is a Service? Services can be lower level (technical) DNS Auth Web Requests Responses
  • 19. © 2017 SPLUNK INC. What is a Service? Services can also be higher level (business) … DNS Auth Web Requests Responses Customer Transactions Support Desk Requests Responses Technical Business
  • 20. © 2017 SPLUNK INC. What is a Service? RDBMSs Hypervisor and Hosts API Services Packet Network Storage Tiers Web Services CustomerTransactions MobileAPI/ Middleware PartnerPortal DNS Services can encompass multiple tiers of the IT domain. They may also depend upon other services.
  • 21. © 2017 SPLUNK INC. What’s a KPI DNS Customer Transactions Business Functions KPI: Request volume KPI: Error rate KPI: Average response time KPI: Server CPU load KPI: Configuration changes KPI: Transaction volume KPI: Error rate KPI: Average response time KPI: Max response time KPI: Count of Change records KPI: Business volume KPI: Error rate KPI: Revenue rate KPI: Conversion rate KPI: Count of incident tickets KPIs and health scores constitute the means by which Services are monitored.
  • 22. © 2017 SPLUNK INC. A key performance indicator (KPI) is a Splunk saved search created within the Splunk ITSI UI that helps monitor a specific field like CPU, Memory, Number of Errors and so on. KPIs are contained within services. What’s a Key Performance Indicator?
  • 23. © 2017 SPLUNK INC. Service Health Scores A health score is a score from 0-100 (0 = critical and 100 = normal) that helps determine the health of a service. It is calculated based on importance and status (e.g., green, orange, red) of all KPIs, once every minute.
  • 24. © 2017 SPLUNK INC. Splunk IT Service Intelligence Let’s take a closer look at Service Intelligence with Splunk
  • 25. © 2017 SPLUNK INC. Service Intelligence Design Practices
  • 26. © 2017 SPLUNK INC. Bring subject experts together Design before configuring Best Practices for Service Intelligence Start with a problem worth solving 1 2 3
  • 27. © 2017 SPLUNK INC. Best Practices for Service Intelligence Define services, entities and KPIs Monitor and troubleshoot Analyze and detect Data-Defined, Data-Driven Service Insights
  • 28. © 2017 SPLUNK INC. The Business Problem for Buttercup Games Supply Chain Limited Visibility Online Store Poor Customer Satisfaction ERP Frequent Bottlenecks Online Store Poor Customer Satisfaction Failed Interactions War Room Escalations Business Impact $48K / Week Lost Revenue $
  • 29. © 2017 SPLUNK INC. Service Decomposition Business Layer Mail Transport - Order Processing E-Commerce - Financials Service Layer Business Service Application Layer Middleware – Application Server - Database Custom Apps Infrastructure Layer Power / Cooling / Facilities Server – Networking – Storage
  • 30. © 2017 SPLUNK INC. Typical Data Sources Order Entry Manufacturing Shipping Fulfillment Online Store EDI Web Tier Middleware Service Layer Business Layer Application Layer Infrastructure Layer Supply Chain • Total Orders • Total Revenue • Unit Count • Unit Failures • Service Level • Delivery Time • Online Orders • Online Revenue • Response Time • Service Health • Incidents/Changes • Customer Satisfaction • HTTP Hits • Error Rate • CPU Load • Memory Used • Disk Used • IO Latency • CPU Load • Memory Used • Disk Used • IO Latency • Response Time • Error Rate • Response Time • Storage Free
  • 31. © 2017 SPLUNK INC. ▶ High-value business services • Buttercup Games Online Store and Supply Chain ▶ Supporting services • Web, Middleware, Database ▶ Relevant KPIs for each service • Database:, errors, SQL hits, …) ▶ Splunk search for each KPI • (index=DB (warn* OR error*) | stats count) Service Intelligence Design in Splunk ITSI
  • 32. © 2017 SPLUNK INC. Service Intelligence Design – Buttercup Games Order Entry Manufacturing Shipping Fulfillment Online Store EDI Web Tier Middleware Service Layer Business Layer Application Layer Infrastructure Layer • Application Logs • Corporate Databases • Service Management • Application Logs • Webserver Logs • DB Perf Counters • Wire data • Perf Counters • Access Logs • Network Logs Supply Chain
  • 33. © 2017 SPLUNK INC. Let’s Play! Setting up Service Intelligence
  • 34. © 2017 SPLUNK INC. Service Visibility in Splunk ITSI Click Glass Tables
  • 35. © 2017 SPLUNK INC. Service Visibility in Splunk ITSI Click (open in new tab) Buttercup Games Business Process (IN PROGRESS)
  • 36. © 2017 SPLUNK INC. Service Visibility in Splunk ITSI Click (open in new tab) Buttercup Games Online Store
  • 37. © 2017 SPLUNK INC. Goal 1: Supply Chain Visibility
  • 38. © 2017 SPLUNK INC. Goal 2: Online Store Process Flow
  • 39. © 2017 SPLUNK INC. ▶ Create a new KPI for the DB Service: • Network Utilization ▶ Modify the Executive Glass Table in order to show off the services you slave over New Requirements! “We only have about 15min TO DO WHAT ???!!???” Think about how long this would take you today?
  • 40. © 2017 SPLUNK INC. Configuration of DB Service Click Configure > Services
  • 41. © 2017 SPLUNK INC. Let’s Talk Entities ▶ Entities are the relevant things which support this service (usually hosts) ▶ Select the right entries with filters, ANDs, ORs ▶ Original Entity list can come from CMDB, spreadsheet, Splunk search, others Click DB Service
  • 42. © 2017 SPLUNK INC. ▶ Click New – Generic KPI A KPI in 5 minutes? Absolutely! Call it Network Utilization, with your username up front ▶ Select Data Model • Host Operating System • Network • # bytes • Next
  • 43. © 2017 SPLUNK INC. ▶ Select Yes for Split by & Filter options ▶ Select host for Entity Lookup & Alias options ▶ Click Next KPIs Continued…. Splunk Builds Searches for you – Oh Yeah, that’s happening 
  • 44. © 2017 SPLUNK INC. ▶ KPI Search Schedule: Every Minute ▶ Entity Calculation: Average ▶ Service/Agg Calculation: Average ▶ Calculation Window: Last Minute ▶ Click Next Almost There… Unit: Bps Click Next
  • 45. © 2017 SPLUNK INC. Final Steps… ▶ Set your thresholds: • Aggregate (All) • Per Entity ▶ Click Add Threshold TWICE ▶ Make the Neapolitan ice cream colors Yellow, Green, Yellow ▶ Drag the sliders around in order to get the current data graph entirely inside the Green (normal) band ▶ Click Finish ▶ Other options are also available, including adaptive thresholds and anomaly detection
  • 46. © 2017 SPLUNK INC. ▶ What if your KPI data looks like this? Adaptive Thresholds
  • 47. © 2017 SPLUNK INC. ▶ Static thresholds will not work… Adaptive Thresholds
  • 48. © 2017 SPLUNK INC. ▶ Adaptive thresholding works beautifully with cyclical (and other dynamic) data Adaptive Thresholds
  • 49. © 2017 SPLUNK INC. Anomaly Detection ▶ Machine learning ▶ “Trending” detects deviations for aggregate KPI based on historical trends ▶ “Entity cohesion” detects entities which deviate from “pack” behavior
  • 50. © 2017 SPLUNK INC. Let’s Fix That Glass Table
  • 51. © 2017 SPLUNK INC. ▶ Return to Saved Glass Tables page (click on Glass Tables in the upper menu bar) ▶ Click Edit for “Buttercup Games Business Process (IN PROGRESS)” • Select Clone • Title: Add your username to the front • Permissions: Shared in App • Click Clone Page • Click on your new Glass Table from the list, to view it Clone the Glass Table
  • 52. © 2017 SPLUNK INC. ▶ Click on Edit in the upper right corner of your Glass Table ▶ Use the “Services” panel on the left to select Individual KPIs, or Aggregate Service Health Scores ▶ Choose 2 KPIs from Online Store that would be useful in the “Order Process” section ▶ Drag the selected widgets onto the canvas, positioning in the gray oval ▶ What’s the difference between the and tools at the top left? Edit & Have Fun!
  • 53. © 2017 SPLUNK INC. More Fun with the Glass Table Editor… Use the Configurations panel on the right to edit a selected widget ▶ Can change the visualization type, drilldown behavior and other settings ▶ You should hit Save frequently ▶ Revert All Changes can be helpful, occasionally
  • 54. © 2017 SPLUNK INC. ▶ Add a ServiceHealthScore widget for Online Store under Buttercup ▶ Choose a Viz Type with a sparkline graph, then resize to make it look pretty ▶ Modify the Custom Drilldown action to go to the saved glass table, Buttercup Games Online Store ▶ Bonus Points: Make the label bigger, more readable ▶ Click Save ▶ View when done Finishing Up …
  • 55. © 2017 SPLUNK INC. Let’s Play! A Troubleshooting Exercise
  • 56. © 2017 SPLUNK INC. ▶ Let’s use Splunk ITSI to troubleshoot an outage ▶ Start at your Glass Table, “<UserName> Buttercup Business Process” ▶ Customer Care reports that unhappy customers are complaining of failures and long delays when trying to purchase ▶ The calls began coming in at around the top of the last hour. ▶ In the upper right corner of the Glass Table, change the time picker from Now to XX:00:00.0, where XX is the previous hour. For example, if it is currently 14:05, set the time picker to 13:00:00.0, then Apply ▶ This is how we can “time travel” back to see conditions at a particular outage– oh yeah! A Troubleshooting Exercise
  • 57. © 2017 SPLUNK INC. ▶ The Online Store seems to be degraded, just as Customer Care reported. Click on the widget under Buttercup to drill down further A Troubleshooting Exercise, cont’d
  • 58. © 2017 SPLUNK INC. ▶ The Online Store Glass Table shows a much more detailed view, including the impacted customer-facing KPIs at the far left (Revenue, etc) ▶ Based on this view of all the relevant services, where do you think the root cause lies? ▶ Which service should we troubleshoot first? ▶ Click on Health widget for that service, to drill down to a Deep Dive A Troubleshooting Exercise, cont’d.
  • 59. © 2017 SPLUNK INC. ▶ Deep Dive shows multiple KPIs and health scores in parallel “swim lanes”. ▶ The health score for this service is the top swim lane. • Can you see when it begins to degrade from 100%? ▶ Mousing over this point in time, can you spot the KPI with the leading fault indication, i.e., what failed first? ▶ To improve readability, make sure the Primary Time Range (upper right corner) is set to Relative > Earliest: 2 Hours Ago Deep Dive
  • 60. © 2017 SPLUNK INC. ▶ Click on Notable Events Review ▶ Multiple KPIs and health scores can be combined in sophisticated ways to create Multi-KPI alerts ▶ When a Multi-KPI alert fires, one of the outcomes is the creation of a Notable Event ▶ Notable Events allow NOC personnel and others to triage and coordinate event management efforts Multi-KPI Alerts and Notable Events
  • 61. © 2017 SPLUNK INC. ▶ Click on Service Analyzer > Default Service Analyzer Back where we started! ▶ This view shows a “no-frills” list of services (top) and hottest KPIs (bottom) ▶ Provides access into Service Details ▶ It is useful for NOCs and others who need a high-level situational view Service Analyzer
  • 62. © 2017 SPLUNK INC. Let’s Play! Advanced Exercises
  • 63. © 2017 SPLUNK INC. ▶ High-value services can be decomposed and modeled in Splunk ITSI, using machine data from the relevant systems ▶ Services and KPIs can be created in minutes, with sophisticated thresholding techniques to distinguish “normal” from “not normal” ▶ Glass Tables allow service health and KPI metrics to be displayed in a way that makes sense to specific groups, such as Executive Leadership, Business Service Owners, the NOC, DevOps & Others ▶ Deep Dives allow KPIs to be compared side-by-side across any time range, accelerating root cause analysis and significantly reducing MTTR ▶ Multi-KPI Alerts and Notable Events reduce alert noise, producing actionable events and a means to manage them ▶ …and it’s fast + fun to build! Summary
  • 64. © 2017 SPLUNK INC. What Splunk ITSI Customers Are Doing
  • 65. © 2017 SPLUNK INC. Splunk IT Service Intelligence Machine Learning-Powered, Analytics-Driven IT Operations Prioritize incidents with context Deliver business & service context to prioritize incident investigation & action Redefine the role of IT Support decisions & communicate results with powerful service-level insights Simplify service operations Leverage machine learning to detect anomalies & highlight events that matter Unify siloed monitoring Combine events & metrics across silos with ease, flexibility & scale in days
  • 66. © 2017 SPLUNK INC. All the scores are time based KPIs or nested sub processes that are searching in real time for some relevant condition of interest. These are Health Scores – a high level aggregation of the health of the underlying processes. All the scores are color coded to convey if they are “normal” or “abnormal” based on your criteria OR Splunk’s Packaged Machine Learning, enabled with an ON/OFF switch. This shows how ‘Glass Tables’ can visualize key performance indicators and health scores that combine data from diverse sources. This example is an abbreviated ‘Book to Bill’, or sometimes called ‘Order to Cash’ business process. Splunk’s Solution: A lens could be multiple processes…
  • 67. © 2017 SPLUNK INC. Call Center Service Service Health Transactions ACD Analysis – Core Splunk Call Wait History Inbound Analysis Social Media Online Msg Social Media Mail SupportVOIP Service Inbound Calls
  • 68. © 2017 SPLUNK INC. Online Transactions Internal Transfer Service External Wire Service Money Exchange Service Money Transfer Services Service Health Corporate Reconciliation Service Fed Exchange Service Core Splunk Searches Transaction History System Investigation Heat Map Analysis
  • 69. © 2017 SPLUNK INC. CIO Scorecard Enterprise Service Status Major Incidents Service Health Continuous Operational Visibility Volume Revenue Incidents Changes Major Changes Service Health Volume Revenue Incidents Changes Service Health Volume Ontime Delivery Incidents Changes Service Health VolumeRevenue Incidents Changes Service Health Volume Revenue Incidents Changes Container UtilService Health Throughput Incidents Changes
  • 70. © 2017 SPLUNK INC. What is it? ▶ 1-day on-site workshop ▶ Tightly linked with value ▶ Collaborative approach ▶ Build your own Splunk ITSI Glass Table Sign Up Now – We’re Here to Help Define methods for: ▶ Proactive service monitoring ▶ Reduced risk and failures ▶ Faster issue resolution ▶ Increased business performance Harness the creativity and domain knowledge of your organization to unlock the value of data and solve an important Business Service problem through a joint service intelligence workshop with key stakeholders
  • 71. © 2017 SPLUNK INC. Our Workshop In Action
  • 72. © 2017 SPLUNK INC. Splunk Quick Start for Service Intelligencee Enterprise License Splunk ITSI License Education Professional Services .conf Passes Value Assurance Edition Services Edition Platform Edition * Splunk ITSI 6-month license *
  • 73. © 2017 SPLUNK INC. Bring your subject experts together Conduct a Service Intelligence workshop Your Mission, Should You Choose to Accept It… Find a problem worth solving in your enterprise 1 2 3
  • 74. © 2017 SPLUNK INC. ▶ Splunk ITSI Sandbox Guide: (An app on your Splunk ITSI instance) ▶ ITSI Documentation: http://docs.splunk.com/Documentation/ITSI For More Info…
  • 75. © 2017 SPLUNK INC.© 2017 SPLUNK INC. Thank You

Editor's Notes

  • #3: Here is a 3.5-hr schedule: 00:00     (slides 1-6) Introductions and Setup 00:07     (slides 7-11) Splundamentals -- Core Splunk in IT Ops 00:15     IT Troubleshooting Demo 00:25 (slides 12-24) Splunk IT Service Intelligence (ITSI) 00:45     ITSI Tour 00:55     BREAK   01:00     (slides 25-39) Service Intelligence Design Practices 01:20     (slides 40-62 / demo) Let's Play – Setting up Service Intelligence 01:55     BREAK   02:00     (slides 62-67 / demo) Let’s Play -Troubleshooting exercise 02:15     (demo) Advanced Exercise #1 - Create a SmartPhone Service 02:15     (demo) explore the 'sourcetype=mint:network' data; what KPIs are possible? 02:20     (demo) Create new service ("SmartPhone"), create 3 KPIs 02:35     BREAK   02:40 Advanced exercises continued 02:40     (demo) Modify 'Online Store' GT to add the new service 02:45     (demo) Advanced Exercise #2 - Show Adaptive Thresholding 02:47     (demo) Advanced Exercise #3 - Show Anomaly Detection 02:50     (slide 70) Summary 02:55     BREAK   03:00 (slides 71-78) What our ITSI customers are doing 03:20     (slide 79-80 + sign up discussion) What is GTE?  Ideas for ITSI use cases? 03:25     (slides 81-82) Wrap up 03:30     DONE!
  • #5: In order to solve these evolving needs, a certain class of customers began to leverage the platform and take advantage of the data they already had indexed within Splunk. They built some pretty sophisticated use-cases to improve operational efficiencies. And they way they are doing this is by adding a service perspective to the data they already have in Splunk. What became apparent as we spoke to those customers was that we have the ability to transform this age-old problem of troubleshooting and monitoring with a new approach driven by machine data. Given our customers had custom built service insights using the data they already had in the platform, it was a natural evolution for us to build an integrated solution based on our customers successes and make Splunk service-aware. This helps our customers to maximize the value they can get from Splunk with a machine data driven approach to monitoring and analytics.
  • #6: This is the “IT Troubleshooting” demo, about 15-20 min Cover Splunk 101 basics
  • #7: Getting data into Splunk is designed to be as flexible and easy as possible. Because the indexing engine is so flexible and doesn’t generally require configuration for most machine data generated by all of the devices, control systems, sensors, SCADA, networks, applications and end users connected by industrial networks. There are many options: Splunk can directly monitor hundreds or thousands of local files, index them and detect changes. Additionally, many customers use our out-of-the-box scripts and tools to generate data – common examples include performance polling scripts on *nix hosts, API and more. You can onboard data directly from any application or device– opening up new types of machine data to the benefits of Splunk analysis. The Event Collector makes it simple and efficient to collect this data, scaling to millions of events per second, using a developer-friendly, standard HTTP/JSON API and logging libraries   The HTTP Event Collector (EC) uses a standard API and high-volume Splunk endpoint to allow events to be directly sent/collected at extreme velocity. The data volumes supported by Splunk are ideal for IoT and industrial data. There are many free add-ons and Apps for Splunk software that simplify the connection and collection of data from both industrial systems and the Internet of Things. These include: Protocol Data Inputs: Recieve data via a number of different data protocols such as TCP , TCP(s) ,HTTP(s) PUT/POST/File Upload , UDP , Websockets , SockJS.  Rest API Modular Input: Poll local and remote REST APIs and index the responses. Amazon Kinesis Modular Input: Index data from Amazon Kinesis, a fully managed service for real-time streaming data. Apache Kafka Modular Input: Index messages from Apache Kafka messaging brokers, including clusters managed by Zookeeper. DB Connect 2: Integrate structured data sources with your Splunk real-time machine data collection. MQTT Modular Input: Index messages from MQTT, a machine-to-machine connectivity protocol, by subscribing Splunk software to MQTT Broker Topics. AMQP Modular Input: Index data from message queues provided by AMQP brokers. JMS Modular Input: Poll and index message queues and topics from messaging queues and topics, including MQTT messages, provided by message providers, including TibcoEMS, Weblogic JMS and ActiveMQ. COAP Modular Input: Index messages from a COAP (Constrained Application Protocol) Server. SNMP Modular Input: Collect data by polling SNMP attributes and catching SNMP traps from datacenter infrastructure devices providing cooling and power distribution. Splunk App for Stream: Capture, filter and index real-time streaming wire data and network events. Splunk isn’t the only technology that can benefit from collecting machine data, so let Splunk help send the data to those systems that need it. For those systems that want a direct tap into the raw data, Splunk can forward all or a subset of data in real time via TCP as raw text or RFC-compliant syslog. This can be done on the forwarder or centrally via the indexer without incrementing your daily indexing volume. Separately, Splunk can schedule sophisticated correlation searches and configure them to open tickets or insert events into SIEMs or operation event consoles. This allows you to summarize, mash-up and transform the data with the full power of the search language and import data into these other systems in a controlled fashion, even if they don’t natively support all the data types Splunk does.
  • #8: Getting data into Splunk is designed to be as flexible and easy as possible. Because the indexing engine is so flexible and doesn’t generally require configuration for most machine data generated by all of the devices, control systems, sensors, SCADA, networks, applications and end users connected by industrial networks. There are many options: Splunk can directly monitor hundreds or thousands of local files, index them and detect changes. Additionally, many customers use our out-of-the-box scripts and tools to generate data – common examples include performance polling scripts on *nix hosts, API and more. You can onboard data directly from any application or device– opening up new types of machine data to the benefits of Splunk analysis. The Event Collector makes it simple and efficient to collect this data, scaling to millions of events per second, using a developer-friendly, standard HTTP/JSON API and logging libraries   The HTTP Event Collector (EC) uses a standard API and high-volume Splunk endpoint to allow events to be directly sent/collected at extreme velocity. The data volumes supported by Splunk are ideal for IoT and industrial data. There are many free add-ons and Apps for Splunk software that simplify the connection and collection of data from both industrial systems and the Internet of Things. These include: Protocol Data Inputs: Recieve data via a number of different data protocols such as TCP , TCP(s) ,HTTP(s) PUT/POST/File Upload , UDP , Websockets , SockJS.  Rest API Modular Input: Poll local and remote REST APIs and index the responses. Amazon Kinesis Modular Input: Index data from Amazon Kinesis, a fully managed service for real-time streaming data. Apache Kafka Modular Input: Index messages from Apache Kafka messaging brokers, including clusters managed by Zookeeper. DB Connect 2: Integrate structured data sources with your Splunk real-time machine data collection. MQTT Modular Input: Index messages from MQTT, a machine-to-machine connectivity protocol, by subscribing Splunk software to MQTT Broker Topics. AMQP Modular Input: Index data from message queues provided by AMQP brokers. JMS Modular Input: Poll and index message queues and topics from messaging queues and topics, including MQTT messages, provided by message providers, including TibcoEMS, Weblogic JMS and ActiveMQ. COAP Modular Input: Index messages from a COAP (Constrained Application Protocol) Server. SNMP Modular Input: Collect data by polling SNMP attributes and catching SNMP traps from datacenter infrastructure devices providing cooling and power distribution. Splunk App for Stream: Capture, filter and index real-time streaming wire data and network events. Splunk isn’t the only technology that can benefit from collecting machine data, so let Splunk help send the data to those systems that need it. For those systems that want a direct tap into the raw data, Splunk can forward all or a subset of data in real time via TCP as raw text or RFC-compliant syslog. This can be done on the forwarder or centrally via the indexer without incrementing your daily indexing volume. Separately, Splunk can schedule sophisticated correlation searches and configure them to open tickets or insert events into SIEMs or operation event consoles. This allows you to summarize, mash-up and transform the data with the full power of the search language and import data into these other systems in a controlled fashion, even if they don’t natively support all the data types Splunk does.
  • #9: The rise of big data has forced IT organizations to transition from a focus on structured, relational data, to accommodate unstructured data, driven by the volume, velocity and variety of today’s applications and systems. As the data has changed from structured data to unstructured data, the technology approach needs to change as well. When you don’t know what data types you’ll need to analyze tomorrow or what questions you need to ask in a week, flexibility becomes a key component of your technology decisions. The ability to index any data type, search across silos and avoid being locked into a rigid schema opens a new world of analytics and business insights to your organization. Schema at Read – Enables you ask any question of the deal Search – Enables rapid, iterative exploration of the data along with advanced analytics Universal Indexing – Enables you to ingest any type of machine data Horizontal scaling over commodity hardware enables big data analytics
  • #11: That brings us to Splunk IT Service Intelligence – a packaged solution that enables real-time visibility into services driven by machine data. Splunk ITSI speeds and simplifies service monitoring and analytics and enables IT to make better, smarter and informed business decisions. This solution allows you to gain a deep understanding of your services. With Splunk ITSI, you have real-time views into the health of your services, and can use advanced analytics to find patterns, detect anomalies and trends to proactively monitor and address issues. As a result you have improved service visibility, reduced resolution times, and a transformative approach to monitoring and analytics driven by machine-data.
  • #26: FOR THE PRESENTER: This entire Tour section should last no more than 10 min. Describe how GTs can show KPIs & health scores to any audience/group/team: Show GTs: Buttercup Games Business Process (executives, business service owners) On Line Transaction Service (NOC, Tier2); “can use visio diagrams…” Buttercup Games Online Store (service flow, sub-services) Show saved Deep Dive “DB Deep Dive”; BRIEFLY describe DD functionality (you will be able to go into more depth later) Show Notable Event Review, BRIEFLY describe (you will be able to go into more depth later) Show Service Analyzer, briefly describe Ask if the students have questions.
  • #28: So, let’s look at a simple visual to discuss how it works? In four simple steps, customers can achieve data driven service insights. They Get the data in. (all the data…) They quickly define services, entities, and KPIs They monitor and troubleshoot They analyze and detect Through these steps, the customers is able to realize the value of Data Defined, Data Driven Service Insights.
  • #32: In the “real world”, it will probably be necessary to iterate up & down these steps a few times. For example, what if a KPI requires data which is not being collected by Splunk?
  • #35: FOR THE PRESENTER: The next three slides set the students up for decomp discussion, later. GOALS: Get the students to open two specific GTs in separate browser tabs.
  • #36: These actions set the student up for decomp discussion, later
  • #37: These actions set the student up for decomp discussion, later
  • #38: TO STUDENTS: You have this glass table on your own system. This Glass Table shows the high-level business process for Buttercup Games. Does anyone notice anything missing? (no info in Order Entry) We need better visibility into our Online Store, which is part of the Order Entry process.
  • #39: TO STUDENTS: You have this glass table on your own system. This Glass Table shows a more detailed process flow for the Online Store service. Notice the sub-services which make up our Online Store service, and how the process flows.
  • #40: Based on a recent DB outage which was caused by a saturated network interface, we’ve decided that network utilization would be a handy KPI for our Database Service. We’re also going to tweak the high-level Business Process Glass Table to provide more visibility into the Online Store service. And we’re going to do it in 15 minutes!
  • #41: FOR THE PRESENTER: Remind the students that they can refer to their own locally-downloaded slides for “click-by-click” reference for the process of adding a KPI. Then switch to your own browser and demonstrate these steps “live”. Have fun with the concept that a roomful of people can build a new KPI in only a few minutes, and that “the clock is ticking”.
  • #42: FOR THE PRESENTER: SHORT discussion of entities
  • #43: FOR THE PRESENTER: Briefly cover “data model” vs “ad hoc search”. Don’t spend a lot of time here.
  • #44: FOR THE PRESENTER: Briefly cover the concepts on this page, and point how the “Generated Search” window at the bottom, and how cool it is that Splunk builds the search for you; does anyone in the audience have users who could benefit from this? QUICK TANGENT: In the typical working environment, which often has a chasm between the “Business Types” and the “Tech Types”, how long would it take to map services to actual infrastructure?  "Many quarters, and possibly a year-- on the conservative side, right?" To quantify that, by show of hands, has anyone here been involved in an IT Service Management / Business Management team trying to map every server to a service or business function?  Did you sustain any long-term injuries?  And even IF you are successful in this effort, as soon as you finish you have to start over. ITSI is remarkable because it can allow the Business teams and Technical teams to map out the important services realistically and effectively– in DAYS and WEEKS. We offer a Glass Table Workshop to facilitate such an exercise on YOUR services and YOUR data– in a single day.
  • #45: Keep moving…
  • #46: FOR THE PRESENTER: This might take a while for “waiting for data” to produce an actual graph for the students (1-2 minutes, typically). Instruct the students that if will take a couple of minutes for the data to appear, and to not click on anything in the meantime. Then skip to the Adaptive Thresholds and Anomaly Detection slides and discussion, while the students wait. Afterwards, can be helpful to gauge progress by asking for a show of hands to see how many students are still waiting. If necessary, simply show the students how to set thresholds (on your own browser), then move forward.
  • #47: FOR THE PRESENTER: Talk through-- NOT HANDS ON
  • #48: FOR THE PRESENTER: Talk through-- NOT HANDS ON
  • #49: FOR THE PRESENTER: Talk through-- NOT HANDS ON
  • #50: Talk through NOT WORK
  • #51: We’ve already discussed the high-level business process for Buttercup Games. We need better visibility into our Online Store, which is part of the Order Entry process.
  • #52: FOR THE PRESENTER: As before, switch to your own browser and demonstrate these steps “live”. Have fun with the concept of saving a copy before editing– so that you don’t muck it up.
  • #53: FOR THE PRESENTER: Have fun with this GT editor section. The GT editor is a bit twitchy, so exploit the humor and have fun with the students. GOALS (for the next 3 slides): Identify 2 “interesting/useful” KPIs from the Online Store service, to position in the gray “Order Entry” oval; let the students choose details and viz types Put a ServiceHealthScore widget (from Online Store) under the pony, to show overall health of the service. Modify “custom drilldown” to land on the “Buttercup Games Online Store” GT Encourage the students to use text boxes and other techniques to make the widget more readable, prettier to look at Remind the students that “the boss’ boss” will be looking at this GT, and we want to make sure that they’ve got good visibility into “our” service (Online Store).
  • #55: FOR THE PRESENTER: When finished (after everyone have hit ‘Save’ and ‘View’, and are looking at their own beautiful GTs): How long did it take to create a new KPI and make major changes to a Glass Table? Pretty cool! Ask the students if this (ITSI) could be useful in their own environments If you have more than 15 min of remaining time, speak through some actual (referenceable) customer ITSI use cases.
  • #57: FOR THE PRESENTER: This hands-on section can be very powerful for the students. This allows them to “put it all together”, driving ITSI with their own fingers. As before, switch to your own browser and demonstrate these troubleshooting steps “live”. The corresponding slides are intended as reference for the students. If pressed for time (i.e., less than about 10 min), talk through and show this process– but don’t have the students attempt to “click along” in real time.
  • #58: If pressed for time, talk through and show this process– but don’t have the students attempt to “click along” in real time
  • #59: Note that this “drill down” has inherited the same time selection (i.e., an earlier outage)– pretty cool! FOR THE PRESENTER: The major points here: During the heat of battle, when troubleshooting an outage, being able to visualize the entire service flow is extremely valuable By being able to see health status of all the underlying services, we can quickly choose where and how best to proceed. Potentially huge time savings– customers report major reductions in MTTR
  • #60: FOR THE PRESENTER: This is a good “variable time” section. You can spend as little or as much time as you choose, depending on how much time you have remaining in the session. Remind the students that they will have more time to play with DD later (yes, they might be confused by this, since only a few minutes remain in the session)
  • #61: FOR THE PRESENTER: This is another good “variable time” section. You can spend as little or as much time as you choose, depending on how much time you have remaining in the session. Remind the students that they will have more time to play with this stuff later (yes, they might be confused by this, since only a few minutes remain in the session)
  • #63: If time permits, try some or all of these: Create a new service based on app data coming from the customers’ smartphones (sourcetype="mint:network”) No identified entities, though we will show “pseudo entities” KPI 1: Total hits (sourcetype="mint:network" | stats count) KPI 2: Errors by device type (sourcetype="mint:network" statusCode>399 | stats count by device) KPI 3: Latency by carrier (sourcetype="mint:network" | eval latency=(latency/1000) | stats avg(latency) by device) (Show pseudo-entities in Service Details) Add the new service to the “Online Store” GT Show students how to access & play with Adaptive Thresholds in Configure Services -> Web Service -> Corporate Web Requests Show students how to access & play with Anomaly Detection in Configure Services -> Web Service -> CPU % Show students more details about modules, through Service Details Create a new GT, using a customer diagram as the background Use existing services & KPIs as mockups, renaming the widgets to fit the customer’s environment