Machine Data 101 Hands-on

Copyright © 2014 Splunk Inc.
Machine Data 101:
Turning Data Into Insight
Eric Merkel
Sr. Sales Engineer

Agenda
 What is Machine Data? What is Splunk?
 Non-Traditional Data Sources
 Data Enrichment
 Level Up on Search and Reporting Commands
 Data Models and Pivot
 Advanced Visualizations and the Web Framework
3

What Does Machine Data Look Like?
Sources
Order Processing
Twitter
Care IVR
Middleware
Error
4

Machine Data Contains Critical Insights
Customer ID Order ID
Customer’s Tweet
Time Waiting On Hold
Twitter ID
Product ID
Company’s Twitter ID
Customer IDOrder ID
Customer ID
Sources
Order Processing
Twitter
Care IVR
Middleware
Error
5

Machine Data Contains Critical Insights
Order ID
Customer’s Tweet
Time Waiting On Hold
Product ID
Company’s Twitter ID
Order ID
Customer ID
Twitter ID
Customer ID
Customer ID
Sources
Order Processing
Twitter
Care IVR
Middleware
Error
6

Structured
RDBMS
SQL Search
Schema at Write Schema at Read
Traditional Splunk
Splunk Approach to Machine Data
Copyright © 2014 Splunk Inc.
7
ETL Universal Indexing
Volume Velocity Variety
Unstructured

Splunk: The Platform for Machine Data
8
Developer
Platform
Report
and
analyze
Custom
dashboards
Monitor
and alert
Ad hoc
search
Online
Services
Web
Proxy
Data Loss
Prevention
Storage Desktops
Packaged
Applications
Custom
Applications
Databases
Call Detail
Records
Smartphones
and Devices
Firewall
Authentication
File
servers
Endpoint
Threat
Intelligence
Asset
& CMDB
Employee /
HR Info
Data
Stores
Applications
External Lookups
Badging
records
Email
servers
VPN
Any amount, any location, any source
Schema-
on-the-fly
Universal
indexing
No
back-end
RDBMS
No need
to filter
data

Platform for Operational Intelligence
The Splunk Portfolio
Rich Ecosystem of
Apps & Add-Ons
Splunk Premium
Solutions
Mainframe
Data
Relational
Databases
MobileForwarders Syslog/TCP
IoT
Devices
Network
Wire Data
Hadoop

Workshop Setup
11
Wi-fi Access: Splunk! (pwd: splunk2017)
1. Download free Splunk Enterprise http://www.splunk.com/download
2. Download tutorial data (tutorialdata.zip): http://splunk.box.com/mdw101
3. Download lookup table (http_status.csv): http://splunk.box.com/mdw101
4. Add tutorial data to Splunk

Non-Traditional Data Sources
 Network Inputs
 HTTP Event Collector
 Log Event Alert Action
 Splunk App for Stream
 Scripted Inputs
 Database Inputs
 Splunk ODBC Driver
 Modular Inputs
 zLinux Forwarder
 MINT
 Non-Splunk Datastores
13

Traditional Data Sources
 Captures events from log files in real time
 Runs scripts to gather system metrics, connect
to APIs and databases
 Listens to syslog and gathers Windows events
 Universally indexes any data format so it
doesn’t need adapters
14
Windows
• Registry
• Event logs
• File system
• sysinternals
Linux/Unix
• Configurations
• Syslog
• File system
• Ps, iostat, top
Virtualization
• Hypervisor
• Guest OS
• Guest Apps
Applications
• Web logs
• Log4J, JMS, JMX
• .NET events
• Code and scripts
Databases
• Configurations
• Audit/query logs
• Tables
• Schemas
Network
• Configurations
• syslog
• SNMP
• netflow

Network Inputs
 Collect data over any UDP or TCP port
 Some devices only send data over a network port
 Best Practice: use syslog-ng or rsyslog
 Offers persistence
 Categorizes data by host
15

HTTP Event Collector (HEC)
 Collect data over HTTP or HTTPS directly to Splunk
 Application Developer focus – few lines of code in app
to send data
 HEC Features Include:
 Token-based, not credential based
 Indexer Acknowledgements – guarantees data indexing
 Raw and JSON formatted event payloads
 SSL, CORS (Cross Origin Resource Sharing), and Network
Restrictions
16

Log Event Alert Action
 Use Splunk alerting to index a custom log event
 Splunk searchable index of custom alert events
 Configurable Features Include:
 Host
 Source
 Sourcetype
 Index
 Event text – construct the exact syntax of the log event,
including any text, tokens, or other information
17

The Splunk App for Stream
Wire Data Enhances the Platform for
Operational Intelligence
Efficient, Cloud-ready Wire Data Collection
Simple Deployment Supports Fast Time to Value
18

Stream = Better Insights for *
Solution Area Contextual Data Wire Data Enriched View
Application
Management
application logs,
monitoring data,
metrics, events
protocol conversations on
database performance, DNS
lookups, client data, business
transaction paths…
Measure application response
times, deeper insights for root-
cause diagnostics, trace tx
paths, establish baselines…
IT Operations application logs,
monitoring data,
metrics, events
payload data including process
times, errors, transaction
traces, ICA latency, SQL
statements, DNS records…
Analyze traffic volume, speed
and packets to identify
infrastructure performance
issues, capacity constraints,
changes; establish baselines…
19

Stream = Better Insights for *
Solution Area Contextual Data Wire Data Enriched View
Security app + infra logs,
monitoring data,
events
protocol identification,
protocol headers, content
and payload information,
flow records
Build analytics and context for
incident response, threat
detection, monitoring and
compliance
Digital
Intelligence
website activity,
clickstream data,
metrics
browser-level customer
interactions
Customer Experience – analyze
website and application bottlenecks to
improve customer experience and
online revenues
Customer Support (online, call center)
– faster root cause analysis and
resolution of customer issues with
website or apps
20

Scripted Inputs
21
 Send data to Splunk via a custom script
 Splunk indexes anything written to stdout
 Splunk handles scheduling
 Supports shell, Python scripts, WIN batch, PowerShell
 Any other utility that can format and stream data
Streaming Mode
 Splunk executes script and indexes stdout
 Checks for any running instances
Write to File Mode
 Splunk launches script which produces
output file, no need for external scheduler
 Splunk monitors output file

Use Cases for Scripted Inputs
22
 Alternative to file-base or network-based inputs
 Stream data from command-line tools, such as vmstat and iostat
 Poll a web service, API or database and process the results
 Reformat complex or binary data for easier parsing into events and fields
 Maintain data sources with slow or resource-intensive startup
procedures
 Provide special or complex handling for transient or unstable inputs
 Scripts that manage passwords and credentials
 Wrapper scripts for command line inputs that contain special characters

Database Inputs
 Create value with structured data
 Enrich search results with additional
business context
 Easily import data for deeper analysis
 Integrate multiple DBs concurrently
 Simple set-up, non-invasive and secure
DB Connect provides reliable, scalable,
real-time integration between Splunk
and traditional relational databases
23

Configure Database Inputs
24
 DB Connect App
 Real-time, scalable integration with relational DBs
 Browse and navigate schemas and tables before data import
 Reliable scheduled import
 Seamless installation and UI configuration
 Supports connection pooling and caching
 “Tail” tables or import entire tables
 Detect and import new/updated rows using timestamps or unique IDs
 Supports many RDBMS flavors
 AWS RDS Aurora, AWS RedShift, IBM DB2 for Linux, Informix, MemSQL, MS SQL, MySQL,
Oracle, PostgreSQL, SAP SQL Anywhere (aka Sybase SA), Sybase ASE and IQ, Teradata

Splunk ODBC Driver
25
 Interact with, manipulate and visualize machine data in
Splunk Enterprise using business software tools
 Leverage analytics from Splunk alongside third party
solutions such as Microsoft Excel and Tableau Desktop
 Industry-standard connectivity to Splunk Enterprise
 Empowers business users with direct and secure access
to machine data
 Combine machine data with structured data for better
operational context

Modular Inputs
26
 Create your own custom inputs
 Scripted input with structure and intelligence
 First class citizen in the Splunk management interface
 Appears under Settings > Data Inputs
 Benefits over simple scripted input
 Instance control: launch a single or multiple instances
 Input validation
 Support multiple platforms
 Stream data as text or XML
 Secure access to mod input scripts via REST endpoints

Example Modular Inputs
27
Twitter
 Stream JSON data from a Twitter source to Splunk using Tweepy
Amazon S3 Online Storage
 Index data from the Amazon S3 online storage web service
Java Messaging Service (JMS)
 Poll message queues and topics through JMS Messaging API
 Talks to multiple providers: MQSeries (Websphere MQ), ActiveMQ,
TibcoEMS, HornetQ, RabbitMQ, Native JMS, WebLogic JMS, Sonic MQ
Splunk Windows Inputs
 Retrieve WIN event logs, registry keys, perfmon counters

zLinux Forwarder
29
 Easily collect and index data on IBM mainframes
 Collect application and platform data
 Download as new Forwarder distribution for s390x Linux

Extend Operational Intelligence to Mobile Apps
30
Deliver Better
Performing, More
Reliable Apps
Deliver Real-Time
Omni-Channel
Analytics
End-to-End
Performance and
Capacity Insights

Monitor App Usage and Performance
• Improve user retention by quickly
identifying crashes and
performance issues
• Establish whether issues are
caused by an app or the network(s)
• Correlate app, OS and device type
to diagnose crash and network
performance issues
31

Integrated Analytics Platform for Diverse Data Stores
Full-featured,
Integrated
Product
Fast Insights
for Everyone
Works with
What You
Have Today
Explore Visualize Dashboard
s
ShareAnalyze
Hadoop Clusters NoSQL and Other Data Stores
Hadoop Client Libraries Streaming Resource Libraries
Bi-directional
Integration
with Hadoop

Connect to NoSQL and Other Data Stores
• Build custom streaming resource
libraries
• Search and analyze data from
other data stores in Hunk
• In partnership with leading
NoSQL vendors
• Use in conjunction with DB
Connect for relational database
lookups

Virtual Indexes
 Enables seamless use of
almost the entire Splunk
stack on data
 Automatically handles
MapReduce
 Technology is patent
pending

Agenda
 Tags – categorize and add meaning to data
 Field Aliases – simplify search and correlation
 Calculated Fields – shortcut complex/repetitive computations
 Event Types – group common events and share knowledge
 Lookups – augment data with additional external fields
37

 Adds inline meaning/context/specificity to raw data
 Used to normalize metadata or raw data
 Simplifies correlation of multiple data sources
 Created in Splunk
 Transferred from external sources
What is Data Enrichment?
38

 Add meaning/context/specificity to raw data
 Labels describing team, category, platform, geography
 Applied to field-value combination
 Multiple tags can be applied for each field-value
 Case sensitive
Tags
39

 Search events with tag in any field
 Search events with tag in a specific field
 Search events with tag using wildcards
Find the Web Servers
Tags in Action
41
tag=webserver
tag::host=webserver
tag=web*
 Tag the host as
webserver
 Tag the sourcetype
as web
1
2
3
4
5
Back to
Slides

 Normalize field labels to simplify search and correlation
 Apply multiple aliases to a single field
 Example: Username | cs_username | User  user
 Example: c_ip | client | client_ip  clientip
 Processed after field extractions + before lookups
 Can apply to lookups
 Aliases appear alongside original fields
Field Aliases
42

Re-Label Field to Intuitive Name
Create Field Alias
43
1
2
3

 Create field alias of clientip = customer
 Search events in last 15 minutes, find
customer field
 Field alias (customer) and original field
(clientip) are both displayed
Search using an Intuitive Field Name
Field Alias in Action
44
1
3
2
sourcetype=access_combined

 Shortcut for performing
repetitive/long/complex
transformations using
eval command
 Based on extracted or
discovered fields only
 Do not apply to lookup or
generated fields
Calculated Fields
45

Compute Kilobytes from Bytes
Create Calculated Field
46
1
2
1
2
3

 Create kilobytes = bytes/1024
 Search events in last 15 minutes for
kilobytes and bytes
Search Using Kilobytes instead of Bytes
Calculated Fields in Action
47
1
2
Back to
Slides

 Classify and group common events
 Capture and share knowledge
 Based on search
 Use in combination with fields and tags to define
event topography
Event Types
48

 Best Practice: Use punct field
 Default metadata field describing event structure
 Built on interesting characters: ",;-#$%&+./:=?@'|*nr"(){}<>[]^! »
 Can use wildcards
Create Event Types
49
event punct
####<Jun 3, 2014 5:38:22 PM MDT> <Notice>
<WebLogicServer> <bea03> <asiAdminServer>
<WrapperStartStopAppMain> <>WLS Kernel<> <>
<BEA-000360> <Server started in RUNNING mode>
####<_,__::__>_<>_<>_<>_<>_<>_
172.26.34.223 - - [01/Jul/2005:12:05:27 -0700]
"GET /trade/app?action=logout HTTP/1.1" 200 2953
..._-_-_[:::_-]_"_?=_/."__

 Show punct for sourcetype=access_combined
 Pick a punct, then wildcard it after the timestamp
 Add NOT status=200
 Save as “bad” event type + Color:red + Priority:1 (shift
reload in browser to show coloring)
Classify Events as Known Bad
Create Event Type
50
eventtype=bad
sourcetype="access_combined" punct="..._-_-_[//_:::]*" NOT status=200
1
2
3
4
Back to
Slides

Lookups to Enrich Raw Data
LDAP
AD
Watch
Lists
CRM/
ERP
CMDB
External Data Sources
Insight comes out
Data goes inCreate additional fields
from the raw data with
a lookup to an external
data source

 Augment raw events with additional fields
 Provide context or supporting details
 Translate field values to more descriptive data
 Example: add text descriptions for error codes, IDs
 Example: add contact details to user names or IDs
 Example: add descriptions to HTTP status codes
 File-based or scripted lookups
Lookups
52

53
1. Upload/create table
2. Assign table to lookup object
3. Map lookup to data set
Convert a Code into a Description
Configure a Static Lookup

1. Get the lookup http_status.csv file from link:
http://splunk.box.com/mdw101
 Lookup table files > Add new
 Name: http_status.csv (must have .csv file extension)
 Upload: <path to .csv>
 Verify lookup was created successfully
1. Create HTTP Status Table
54
| inputlookup http_status.csv
1
2
3

 Lookup definitions > Add new
 Name: http_status
 Type: File-based
 Lookup file: http_status.csv
 Invoke the lookup manually
2. Add Lookup Definition
55
1
2
sourcetype=access_combined | lookup
http_status status OUTPUT status_description

 Automatic lookups > Add new
 Name: http_status (cannot have spaces)
 Lookup table: http_status
 Apply to: sourcetype = access_combined
 Lookup input field: status
 Lookup output field: status_description
 Verify lookup is invoked automatically
3. Configure Automatic Lookup
56
1
2
Back to
Slides

 Temporal lookups for time-based lookups
 Example: Identify users on your network based on their IP address
and the timestamp in DHCP logs
 Use search results to populate a lookup table
 … | outputlookup <tablename|filename>
 Call an external command or script
 Python scripts only
 Example: DNS lookup for IP  Host
 Create a lookup table using a relational database
 Review matches against a database column or SQL query
Fancy Lookups
57

 Creating and Managing Alerts (Job Inspector)
 Macros
 Workflow Actions
More Data Enrichment
58

Level Up on Search &
Reporting Commands

Agenda
 Doing more with basic search commands
 Advanced search commands
 Doing more with basic reporting commands
60

 top – limit
 rare – same options as top
 timechart – parameters
 stats – functions (sum, avg, list, values, sparkline)
 sort – inline ascending or descending
 addcoltotals
 addtotals
Doing More with Basic Search Commands
63

 Commands have parameters or qualifiers
 top and rare have similar syntax
 Each search command has its own syntax – show inline help
Find Most and Least Active Customers
Using the top + rare Commands
... | top limit=20 clientip
... | rare limit=20 clientip
IPs with the
most visits
IPs with the
least visits

 Sort inline descending or ascending
65
... | stats count by clientip | sort - count
... | stats count by clientip | sort + count
Number of requests by
customer - descending
Number of requests by
customer - ascending
Sort the Number of Customer Requests
Using the sort Command

 Show Search Command Reference Docs
 Functions for eval + where
 Functions for stats + chart and timechart
 Invoke a function
 Rename inline
66
... | stats sum(bytes) by clientip | sort - sum(bytes)
... | stats sum(bytes) as totalbytes by clientip | sort - totalbytes
Total payload by
Total payload by
Determine Total Customer Payload
Using functions + rename command

 List all values of a field
 List only distinct values of a field
67
... | stats values(action) by clientip
... | stats list(action) by clientip
Activity by customer
Distinct actions by
customer
Observe Customer Activity
Using the list + values Functions

 Show distinct actions and cardinality of each action
68
| stats count(action) as value by clientip, action
| eval pair=action + " (" + value + ")"
| stats list(pair) as values by clientip
Analyze Customer Activity
Combine list + values Functions

 Add columns
 Sum specific columns
69
... | stats count by clientip, action
2 cols: clientip + action
... | stats sum(bytes) as totalbytes, avg(bytes) as avgbytes,
count as totalevents by clientip | addcoltotals totalbytes,
totalevents
Sum totalbytes and
totalevents colums
Building a Table of Customer Activity
Add Columns and Sum Columns

70
... | stats sum(bytes) as totalbytes, sum(other) as totalother
by clientip | addtotals fieldname=totalstuff
For each row, add
totalbytes + totalother
A better example:
physical memory + virtual memory =
total memory
Building a Table of Customer Activity
Sum Across Rows

71
... | stats sparkline(count) as trendline by clientip
In context of
larger event set
... | stats sparkline(count) as trendline sum(bytes) by clientip
Inline in tables
Trend Individual Customer Activity
Sparklines in Action
Back to
Slides

Advanced Search Commands
Command Short Description Hints
transaction Group events by a common field value. Convenient, but resource intensive.
cluster Cluster similar events together. Can be used on _raw or field.
associate Identifies correlations between fields. Calculates entropy btn field values.
correlate Calculates the correlation between
different fields.
Evaluates relationship of all fields in
a result set.
contingency Builds a contingency table for two fields. Computes co-occurrence, or % two
fields exist in same events.
anomalies Computes an unexpectedness score for
an event.
Computes similarity of event (X) to a
set of previous events (P).
anomalousvalue Finds and summarizes irregular, or
uncommon, search results.
Considers frequency of occurrence
or number of stdev from the mean

 Sew events together + creates duration + eventcount
 Sparklines inline in tables
73
... | transaction JSESSIONID | table JSESSIONID, action, product_id
Group by
JSESSIONID
View Customer Activity by Session
Using the transaction Command

 Intelligent group (creates cluster_count and cluster_label)
 Sparklines inline in tables
Cluster
74
... | cluster showcount=1 | table _raw, cluster_count, cluster_label
Back to
Slides

 Predict over time
 Chart Overlay with and without streamstats
 Maps with iplocation + geostats
 Single value
 Metered visuals with gauge
Doing More with Basic Reporting Commands
75

 Predict future values using lower/upper bounds – single and multiple series
76
... | timechart count as traffic | predict traffic
Predict Website Traffic
Using the predict Command

77
sourcetype=access_combined (action=view OR action=purchase)
| timechart span=10m count(eval(action="view")) as Viewed,
count(eval(action="purchase")) as Purchased
Compare Browsing vs. Buying Activity
Simple Chart Overlay

78
... | iplocation clientip | geostats count by clientip
Combine IP lookup
with geo mapping
Map Customer Activity Geographically
Geolocation in Action

79
... | stats count
Display a Simple Count of Events
Single Value in Action

Display Counts Using Gauges
Single Value, Radial and Filler Gauges in Action
80
... | stats count | gauge count 10000 20000 30000 40000 50000
Back to
Slides

Agenda
 What is a data model?
 Build a data model
 Pivot Interface
 Accelerate a data model
82

Powerful Analytics Anyone Can Use
Enables non-technical users to build complex
reports without the search language
Provides more meaningful representation
of underlying raw machine data
Acceleration technology delivers results
faster as volume increases
83
Pivot
Data
Model
Analytics
Store

Define Relationships in Machine Data
Data Model
• Describes how underlying
machine data is represented
and accessed
• Defines meaningful
relationships in the data
• Enables single authoritative
view of underlying raw data
Hierarchical object view of underlying data
Add constraints to
filter out events

Transparent Acceleration
• Automatically collected
– Handles timing issues,
backfill…
• Automatically maintained
– Uses acceleration window
• Stored on the indexers
– Peer to the buckets
• Fault tolerant collection
Time window of data
that is accelerated
Check to enable
acceleration of
data model
High Performance
Analytics Store

Easy-to-Use Analytics
• Drag-and-drop interface
enables any user to analyze
data
• Create complex queries and
reports without learning
search language
• Click to visualize any chart
type; reports dynamically
update when fields change
Select fields from
data model
Time window
All chart types available in the chart toolbox
Save report
to share
Pivot

 Defines least common denominator for a
data domain
 Standard method to parse, categorize,
normalize data
 Set of field names and tags by domain
 Packaged as a Data Models in a Splunk App
 Domains: security, web, inventory, JVM,
performance, network sessions, and more
 Minimal setup to use Pivot interface
Common Information Model (CIM) App
87

Custom Visualizations
and the Web
Framework Toolkit

Agenda
 Developer Platform
 Web Framework Toolkit (WFT)
 REST API and SDKs
 Get a Flying Start
89

Optimizing the Analytics Process
90
Focus on the data – intuitive tools
to enable the analyst
No single visualization exists to
handle all data sets.
Never lose sight of the raw data
Splunk
Analytics
Explore
Context
Visualize
Algorithms

Simple, Interactive, and Extensible
91
VISUALIZATION
EXPLORATION
CUSTOMIZABLE
FRAMEWORK
POWERFUL
ANALYTICS
Pivot
Data Models
Interactive Forms
Contextual Drilldown
Dashboard Editor
Web Framework

The Splunk Enterprise Platform
Collection
Indexing
Search Processing Language
Core Functions
Inputs, Apps, Other Content
SDKContent
Core Engine
User and Developer Interfaces
Web Framework
REST API

Powerful Platform for Enterprise Developers
Developers Can Customize and Extend
REST API
Build Splunk Apps Extend and Integrate Splunk
Simple XML
JavaScript
HTML5
Web
Framework
Java
JavaScript
Python
Ruby
C#
PHP
Data Models
Search Extensibility
Modular Inputs
SDKs

A Wealth of Splunk Apps
Over 1,300 apps available on the Splunk apps site
API
SDKs UI
Server, Storage,
Network
Server
Virtualization
Operating
Systems
Custom
Applications
Business
Applications
Cloud
Services
App Performance
Monitoring
Ticketing/ and Other
Web Intelligence
Mobile
Applications
Stream

 Interactive, cut/paste examples from popular
source repositories: D3, GitHub, jQuery
 Splunk 6.x Dashboard Examples App
https://apps.splunk.com/app/1603
 Custom SimpleXML Extensions App
 Splunk Web Framework Toolkit App
Example Advanced Visualizations
95

Splunk Documentation
97
• http://docs.splunk.com
• Official Product Docs
• Wiki and community topics
• Updated daily
• Can be printed to .PDF

Splunk Answers
98
• http://answers.splunk.com
• Community driven
• Splunk supported
• Knowledge exchange
• Q & A

Splunk Education
99
• Recommended for Users
– Using Splunk
– Searching & Reporting
• Recommended for UI/Dashboard Developers
– Developing Apps
• Instructor-Led Courses
– Web
– Onsite

Eric Merkel
emerkel@splunk.com
Happy Splunking!

Machine Data 101 Hands-on

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Machine Data 101 Hands-on (20)

More from Splunk (20)

Recently uploaded (20)

Machine Data 101 Hands-on

Editor's Notes