SlideShare a Scribd company logo
Copyright © 2014 Splunk Inc.
Data Onboarding
Ingestion Without the
Indigestion
David Millis
Client Architect
2
Legal Notice
During the course of this presentation, we may make forward looking statements regarding future events or the
expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could differ
materially. For important factors that may cause actual results to differ from those contained in our forward-looking
statements, please review our filings with the SEC. The forward-looking statements made in this presentation are
being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation
may not contain current or accurate information. We do not assume any obligation to update any forward looking
statements we may make. In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated
into any contract or other commitment. Splunk undertakes no obligation either to develop the features or
functionality described or to include any such feature or functionality in a future release.
Splunk®, Splunk>®, Listen to Your Data®, The Engine for Machine Data®, Hunk™, Splunk Cloud™,
Splunk Storm® and SPL™ are registered trademarks or trademarks of Splunk Inc. in the United States
and/or other countries. All other brand names, product names or trademarks belong to their respective
owners. © 2014 Splunk Inc. All rights reserved.
• Systematic way to bring new data sources into Splunk
• Make sure that new data is instantly usable
& has maximum value for users
• Goes hand-in-hand with the User Onboarding process
(sold separately)
What is the Data Onboarding Process?
Know Your (Data) Pipeline
The Data Pipeline
The Data Pipeline
Any Questions?
The Data Pipeline
• Input Processors: Monitor, FIFO, UDP, TCP, Scripted
• No events yet-- just a stream of bytes
• Break data stream into 64KB blocks
• Annotate stream with metadata keys (host, source,
sourcetype, index, etc.)
• Can happen on UF, HF or indexer
Inputs– Where it all starts
• Check character set
• Break lines
• Process headers
• Can happen on HF or indexer
Parsing Queue
• Merge lines for multi-line events
• Identify events (finally!)
• Extract timestamps
• Exclude events based on timestamp (MAX_DAYS_AGO, ..)
• Can happen on HF or indexer
Aggregation/Merging Queue
• Do regex replacement (field extraction, punctuation
extraction, event routing, host/source/sourcetype
overrides)
• Annotate events with metadata keys
(host, source, sourcetype, ..)
• Can happen on HF or indexer
Typing Queue
• Output processors: TCP, syslog, HTTP
• Indexandforward
• Sign blocks
• Calculate license volume and throughput metrics
• Index
• Write to disk
• Can happen on HF or indexer
Indexing Queue
The Data Pipeline
Data Pipeline: UF & Indexer
Data Pipeline: HF & Indexer
Data Pipeline: UF, IF & Indexer
Data Onboarding Process
• Pre-board
• Build the index-time configs
• Build the search-time configs
• Create data models
• Document
• Test
• Get ready to deploy
• Bring it!
• Test & Validate
Process Overview
• Identify the specific sourcetype(s) - onboard each separately
• Check for pre-existing app/TA on splunk.com-- don't reinvent the wheel!
• Gather info
• Where does this data originate/reside? How will Splunk collect it?
• Which users/groups will need access to this data? Access controls?
• Determine the indexing volume and data retention requirements
• Will this data need to drive existing dashboards (ES, PCI, etc.)?
• Who is the SME for this data?
• Map it out
• Get a "big enough" sample of the event data
• Identify and map out fields
• Assign sourcetype and TA names according to CIM conventions
Pre-Board
• The Common Information Model (CIM) defines
relationships in the underlying data, while leaving the raw
machine data intact
• A naming convention for fields, eventtypes & tags
• More advanced reporting and correlation requires that the
data be normalized, categorized, and parsed
• CIM-compliant data sources can drive CIM-based
dashboards (ES, PCI, others)
Tangent: What is the CIM and why should I care?
• Identify necessary configs (inputs, props and transforms)
to properly handle:
• timestamp extraction, timezone, event breaking,
sourcetype/host/source assignments
• Do events contain sensitive data (i.e., PII, PAN, etc.)?
Create masking transforms if necessary
• Package all index-time configs into the TA
Build the Index-time configs
• Assign sourcetype according to event format; events with
similar format should have the same sourcetype
• When do I need a separate index?
• When the data volume will be very large, or when it will
be searched exclusively a lot
• When access to the data needs to be controlled
• When the data requires a specific data retention policy
• Resist the temptation to create lots of indexes
Tangent: Best & Worst Practices
• Always specify a sourcetype and index
• Be as specific as possible: use /var/log/fubar.log,
not /var/log/
• Arrange your monitored filesystems to minimize
unnecessary monitored logfiles
• Use a scratch index while testing new inputs
Best & Worst Practices – [monitor]
• Lookout for inadvertent, runaway monitor clauses
• Don’t monitor thousands of files unnecessarily–
that’s the NSA’s job
• From the CLI: splunk show monitor
• From your browser:
https://your_splunkd:8089/services/admin/inputstatus/
TailingProcessor:FileStatus
Best & Worst Practices – [monitor]
• Find & fix index-time problems BEFORE polluting your index
• A try-it-before-you-fry-it interface for figuring out
• Event breaking
• Timestamp recognition
• Timezone assignment
• Provides the necessary props.conf parameter settings
Your friend, the Data PreviewerAnother
Tangent!
Data Onboarding Process
Continued
• Identify "interesting" events which should be tagged with an existing CIM tag
(http://docs.splunk.com/Documentation/CIM/latest/User/Alerts)
• Get a list of all current tags: | rest splunk_server=local /services/admin/tags | rename tag_name as tag,
field_name_value AS definition, eai:acl.app AS app | eval definition_and_app=definition . " (" . app . ")" |
stats values(definition_and_app) as "definitions (app)" by tag | sort +tag
• Get a list of all eventtypes (with associated tags): | rest splunk_server=local /services/admin/eventtypes
| rename title as eventtype, search AS definition, eai:acl.app AS app | table eventtype definition app tags
| sort +eventtype
• Examine the current list of CIM tags: for each "interesting" event, identify which tags should be applied to
each. A particular event may have multiple tags
• Are there new tags which should be created, beyond those in the current CIM tag library? If so, add them
to the CIM library
Build the Search-time Configs:
eventtypes & tags
• Extract "interesting" fields
• If already in your CIM library, name or alias appropriately
• If not already in your CIM library, name according to CIM conventions
• Add lookups for missing/desirable fields
• Lookups may be required to supply CIM-compliant fields/field values (for example,
to convert 'sev=42' to 'severity=medium'
• Make the values more readable for humans
• Put everything into the TA package
Build the Search-time Configs:
extractions & lookups
• Create data models. What will be interesting for end users?
• Document! (Especially the fields, eventtypes & tags)
• Test
• Does this data drive relevant existing dashboards correctly?
• Do the data models work properly / produce correct results?
• Is the TA packaged properly?
• Check with originating user/group; is it OK?
Keep Going
• Determine additional Splunk infrastructure required; can
existing infrastructure & license support this?
• Will new forwarders be required? If so, initiate CR process(es)
• Will firewall changes be required? If so, initiate CR process(es)
• Will new Splunk roles be required? Create & map to AD roles
• Will new app contexts be required? Create app(s) as necessary
• Will new users be added? Create the accounts
Get Ready to Deploy
• Deploy new search heads & indexers as needed
• Install new forwarders as needed
• Deploy new app & TA to search heads & indexers
• Deploy new TA to relevant forwarders
Bring it!
• All sources reporting?
• Event breaking, timestamp, timezone, host, source,
sourcetype?
• Field extractions, aliases, lookups?
• Eventtypes, tags?
• Data model(s)?
• User access?
• Confirm with original requesting user/group: looks OK?
Test & Validate
Done!
• Bring new data sources in correctly the first time
• Reduce the amount of “bad” data in your indexes– and the
time spent dealing with it
• Make the new data immediately useful to ALL users– not
just the ones who originally requested it
• Allow the data to drive all sorts of dashboards without
extra modifications
Gee, This Seems Like a Lot of Work…
• http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Datapipelin
e
• http://wiki.splunk.com/Community:HowIndexingWorks
• http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings
• http://docs.splunk.com/Documentation/CIM/latest/User/Overview
• http://docs.splunk.com/Documentation/CIM/latest/User/Alerts
• http://splunk-base.splunk.com/apps/29008/sos-splunk-on-splunk
Reference
Copyright © 2014 Splunk Inc.
Thank You!
David Millis
dmillis@splunk.com

More Related Content

What's hot (20)

PPTX
Splunk Search Optimization
Splunk
 
PDF
The Power of SPL
Splunk
 
PDF
SplunkSummit 2015 - A Quick Guide to Search Optimization
Splunk
 
PDF
Splunk IT Service Intelligence
Splunk
 
PPTX
How to Design, Build and Map IT and Business Services in Splunk
Splunk
 
PPTX
Building Service Intelligence with Splunk IT Service Intelligence (ITSI)
Splunk
 
PPTX
Splunk for IT Operations
Splunk
 
PPTX
SplunkLive 2011 Beginners Session
Splunk
 
PPTX
Splunk
Douglas Bernardini
 
PDF
Distributed load testing with k6
Thijs Feryn
 
PPTX
Grafana vs Kibana
jeetendra mandal
 
PPTX
Getting Data into Splunk
Splunk
 
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PDF
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
PPTX
Apache Spark Fundamentals
Zahra Eskandari
 
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
PPTX
ELK Stack
Phuc Nguyen
 
PPTX
Data Models Breakout Session
Splunk
 
PDF
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
Splunk Search Optimization
Splunk
 
The Power of SPL
Splunk
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
Splunk
 
Splunk IT Service Intelligence
Splunk
 
How to Design, Build and Map IT and Business Services in Splunk
Splunk
 
Building Service Intelligence with Splunk IT Service Intelligence (ITSI)
Splunk
 
Splunk for IT Operations
Splunk
 
SplunkLive 2011 Beginners Session
Splunk
 
Distributed load testing with k6
Thijs Feryn
 
Grafana vs Kibana
jeetendra mandal
 
Getting Data into Splunk
Splunk
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
Apache Spark Fundamentals
Zahra Eskandari
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
ELK Stack
Phuc Nguyen
 
Data Models Breakout Session
Splunk
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 

Viewers also liked (18)

PDF
Data Onboarding
Splunk
 
PPTX
Data Onboarding Breakout Session
Splunk
 
PDF
Splunk conf2014 - Onboarding Data Into Splunk
Splunk
 
PDF
ASGARD Splunk Conf 2016
Keith Kraus
 
PPTX
Data Onboarding Breakout Session
Splunk
 
PPT
SplunkLive! Customer Presentation - University of Alabama at Birmingham
Splunk
 
PPT
EMC RecoverPoint Screenshots
bhenderson
 
PPTX
What's New in 6.3 + Data On-Boarding
Splunk
 
PPTX
EMC VPLEX Continuous availability and non disruptive
solarisyougood
 
PDF
Modeling and Solving Resource-Constrained Project Scheduling Problems with IB...
Philippe Laborie
 
PPTX
Taking Splunk to the Next Level – Architecture
Splunk
 
PPTX
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Harry McLaren
 
PDF
Splunk for DevOps - Faster Insights - Better Code
Philipp Drieger
 
PDF
What's New in Splunk Cloud and Enterprise 6.5
Splunk
 
PPTX
Emc recoverpoint technical
solarisyougood
 
PDF
Field Extractions: Making Regex Your Buddy
Michael Wilde
 
PPTX
Splunk資安智慧分析平台
Ching-Lin Tao
 
PDF
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Etu Solution
 
Data Onboarding
Splunk
 
Data Onboarding Breakout Session
Splunk
 
Splunk conf2014 - Onboarding Data Into Splunk
Splunk
 
ASGARD Splunk Conf 2016
Keith Kraus
 
Data Onboarding Breakout Session
Splunk
 
SplunkLive! Customer Presentation - University of Alabama at Birmingham
Splunk
 
EMC RecoverPoint Screenshots
bhenderson
 
What's New in 6.3 + Data On-Boarding
Splunk
 
EMC VPLEX Continuous availability and non disruptive
solarisyougood
 
Modeling and Solving Resource-Constrained Project Scheduling Problems with IB...
Philippe Laborie
 
Taking Splunk to the Next Level – Architecture
Splunk
 
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Harry McLaren
 
Splunk for DevOps - Faster Insights - Better Code
Philipp Drieger
 
What's New in Splunk Cloud and Enterprise 6.5
Splunk
 
Emc recoverpoint technical
solarisyougood
 
Field Extractions: Making Regex Your Buddy
Michael Wilde
 
Splunk資安智慧分析平台
Ching-Lin Tao
 
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Etu Solution
 
Ad

Similar to SplunkLive! Presentation - Data Onboarding with Splunk (20)

PPTX
SplunkLive! Munich 2018: Data Onboarding Overview
Splunk
 
PPTX
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Splunk
 
PDF
PSUG 1 - 2024-01-22 - Onboarding Best Practices
Tomas Moser
 
PPTX
Splunk Ninjas: New Features and Search Dojo
Splunk
 
PPTX
SplunkLive! London 2016 Splunk Overview
Splunk
 
PPTX
Getting Started with Splunk Break out Session
Georg Knon
 
PPTX
SplunkLive! London: Splunk ninjas- new features and search dojo
Splunk
 
PDF
Machine Data 101
Splunk
 
PPTX
Machine Data 101 Hands-on
Splunk
 
PPTX
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
Georg Knon
 
PPTX
Splunk .conf18 Updates, Config Add-on, SplDevOps
Harry McLaren
 
PDF
Machine Data 101
Splunk
 
PPTX
Getting Started with Splunk Enterprise
Splunk
 
PPTX
SplunkLive! Munich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...
Splunk
 
PPTX
Machine Data 101
Splunk
 
PPTX
Getting Started Getting Started With Splunk Enterprise
Splunk
 
PPTX
Getting Started with Splunk Enterprise Hands-On
Splunk
 
PDF
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
Splunk
 
PPTX
SplunkLive! Getting Started with Splunk Enterprise
Splunk
 
PPTX
Getting started with Splunk - Break out Session
Georg Knon
 
SplunkLive! Munich 2018: Data Onboarding Overview
Splunk
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
Splunk
 
PSUG 1 - 2024-01-22 - Onboarding Best Practices
Tomas Moser
 
Splunk Ninjas: New Features and Search Dojo
Splunk
 
SplunkLive! London 2016 Splunk Overview
Splunk
 
Getting Started with Splunk Break out Session
Georg Knon
 
SplunkLive! London: Splunk ninjas- new features and search dojo
Splunk
 
Machine Data 101
Splunk
 
Machine Data 101 Hands-on
Splunk
 
SplunkLive! Zürich 2014 Beginner Workshop: Getting started with Splunk
Georg Knon
 
Splunk .conf18 Updates, Config Add-on, SplDevOps
Harry McLaren
 
Machine Data 101
Splunk
 
Getting Started with Splunk Enterprise
Splunk
 
SplunkLive! Munich 2018: Legacy SIEM to Splunk, How to Conquer Migration and ...
Splunk
 
Machine Data 101
Splunk
 
Getting Started Getting Started With Splunk Enterprise
Splunk
 
Getting Started with Splunk Enterprise Hands-On
Splunk
 
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
Splunk
 
SplunkLive! Getting Started with Splunk Enterprise
Splunk
 
Getting started with Splunk - Break out Session
Georg Knon
 
Ad

More from Splunk (20)

PDF
Splunk Leadership Forum Wien - 20.05.2025
Splunk
 
PDF
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
PDF
Building Resilience with Energy Management for the Public Sector
Splunk
 
PDF
IT-Lagebild: Observability for Resilience (SVA)
Splunk
 
PDF
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Splunk
 
PDF
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Splunk
 
PDF
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Splunk
 
PDF
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Splunk
 
PDF
Security - Mit Sicherheit zum Erfolg (Telekom)
Splunk
 
PDF
One Cisco - Splunk Public Sector Summit Germany April 2025
Splunk
 
PDF
.conf Go 2023 - Data analysis as a routine
Splunk
 
PDF
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk
 
PDF
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
Splunk
 
PDF
.conf Go 2023 - Raiffeisen Bank International
Splunk
 
PDF
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
Splunk
 
PDF
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
Splunk
 
PDF
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
Splunk
 
PDF
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
Splunk
 
PDF
.conf go 2023 - De NOC a CSIRT (Cellnex)
Splunk
 
PDF
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk
 
Splunk Leadership Forum Wien - 20.05.2025
Splunk
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Building Resilience with Energy Management for the Public Sector
Splunk
 
IT-Lagebild: Observability for Resilience (SVA)
Splunk
 
Nach dem SOC-Aufbau ist vor der Automatisierung (OFD Baden-Württemberg)
Splunk
 
Monitoring einer Sicheren Inter-Netzwerk Architektur (SINA)
Splunk
 
Praktische Erfahrungen mit dem Attack Analyser (gematik)
Splunk
 
Cisco XDR & Splunk SIEM - stronger together (DATAGROUP Cyber Security)
Splunk
 
Security - Mit Sicherheit zum Erfolg (Telekom)
Splunk
 
One Cisco - Splunk Public Sector Summit Germany April 2025
Splunk
 
.conf Go 2023 - Data analysis as a routine
Splunk
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
Splunk
 
.conf Go 2023 - Raiffeisen Bank International
Splunk
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
Splunk
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
Splunk
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
Splunk
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
Splunk
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
Splunk
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
Splunk
 

Recently uploaded (20)

PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Digital Circuits, important subject in CS
contactparinay1
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

SplunkLive! Presentation - Data Onboarding with Splunk

  • 1. Copyright © 2014 Splunk Inc. Data Onboarding Ingestion Without the Indigestion David Millis Client Architect
  • 2. 2 Legal Notice During the course of this presentation, we may make forward looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk®, Splunk>®, Listen to Your Data®, The Engine for Machine Data®, Hunk™, Splunk Cloud™, Splunk Storm® and SPL™ are registered trademarks or trademarks of Splunk Inc. in the United States and/or other countries. All other brand names, product names or trademarks belong to their respective owners. © 2014 Splunk Inc. All rights reserved.
  • 3. • Systematic way to bring new data sources into Splunk • Make sure that new data is instantly usable & has maximum value for users • Goes hand-in-hand with the User Onboarding process (sold separately) What is the Data Onboarding Process?
  • 4. Know Your (Data) Pipeline
  • 8. • Input Processors: Monitor, FIFO, UDP, TCP, Scripted • No events yet-- just a stream of bytes • Break data stream into 64KB blocks • Annotate stream with metadata keys (host, source, sourcetype, index, etc.) • Can happen on UF, HF or indexer Inputs– Where it all starts
  • 9. • Check character set • Break lines • Process headers • Can happen on HF or indexer Parsing Queue
  • 10. • Merge lines for multi-line events • Identify events (finally!) • Extract timestamps • Exclude events based on timestamp (MAX_DAYS_AGO, ..) • Can happen on HF or indexer Aggregation/Merging Queue
  • 11. • Do regex replacement (field extraction, punctuation extraction, event routing, host/source/sourcetype overrides) • Annotate events with metadata keys (host, source, sourcetype, ..) • Can happen on HF or indexer Typing Queue
  • 12. • Output processors: TCP, syslog, HTTP • Indexandforward • Sign blocks • Calculate license volume and throughput metrics • Index • Write to disk • Can happen on HF or indexer Indexing Queue
  • 14. Data Pipeline: UF & Indexer
  • 15. Data Pipeline: HF & Indexer
  • 16. Data Pipeline: UF, IF & Indexer
  • 18. • Pre-board • Build the index-time configs • Build the search-time configs • Create data models • Document • Test • Get ready to deploy • Bring it! • Test & Validate Process Overview
  • 19. • Identify the specific sourcetype(s) - onboard each separately • Check for pre-existing app/TA on splunk.com-- don't reinvent the wheel! • Gather info • Where does this data originate/reside? How will Splunk collect it? • Which users/groups will need access to this data? Access controls? • Determine the indexing volume and data retention requirements • Will this data need to drive existing dashboards (ES, PCI, etc.)? • Who is the SME for this data? • Map it out • Get a "big enough" sample of the event data • Identify and map out fields • Assign sourcetype and TA names according to CIM conventions Pre-Board
  • 20. • The Common Information Model (CIM) defines relationships in the underlying data, while leaving the raw machine data intact • A naming convention for fields, eventtypes & tags • More advanced reporting and correlation requires that the data be normalized, categorized, and parsed • CIM-compliant data sources can drive CIM-based dashboards (ES, PCI, others) Tangent: What is the CIM and why should I care?
  • 21. • Identify necessary configs (inputs, props and transforms) to properly handle: • timestamp extraction, timezone, event breaking, sourcetype/host/source assignments • Do events contain sensitive data (i.e., PII, PAN, etc.)? Create masking transforms if necessary • Package all index-time configs into the TA Build the Index-time configs
  • 22. • Assign sourcetype according to event format; events with similar format should have the same sourcetype • When do I need a separate index? • When the data volume will be very large, or when it will be searched exclusively a lot • When access to the data needs to be controlled • When the data requires a specific data retention policy • Resist the temptation to create lots of indexes Tangent: Best & Worst Practices
  • 23. • Always specify a sourcetype and index • Be as specific as possible: use /var/log/fubar.log, not /var/log/ • Arrange your monitored filesystems to minimize unnecessary monitored logfiles • Use a scratch index while testing new inputs Best & Worst Practices – [monitor]
  • 24. • Lookout for inadvertent, runaway monitor clauses • Don’t monitor thousands of files unnecessarily– that’s the NSA’s job • From the CLI: splunk show monitor • From your browser: https://your_splunkd:8089/services/admin/inputstatus/ TailingProcessor:FileStatus Best & Worst Practices – [monitor]
  • 25. • Find & fix index-time problems BEFORE polluting your index • A try-it-before-you-fry-it interface for figuring out • Event breaking • Timestamp recognition • Timezone assignment • Provides the necessary props.conf parameter settings Your friend, the Data PreviewerAnother Tangent!
  • 27. • Identify "interesting" events which should be tagged with an existing CIM tag (http://docs.splunk.com/Documentation/CIM/latest/User/Alerts) • Get a list of all current tags: | rest splunk_server=local /services/admin/tags | rename tag_name as tag, field_name_value AS definition, eai:acl.app AS app | eval definition_and_app=definition . " (" . app . ")" | stats values(definition_and_app) as "definitions (app)" by tag | sort +tag • Get a list of all eventtypes (with associated tags): | rest splunk_server=local /services/admin/eventtypes | rename title as eventtype, search AS definition, eai:acl.app AS app | table eventtype definition app tags | sort +eventtype • Examine the current list of CIM tags: for each "interesting" event, identify which tags should be applied to each. A particular event may have multiple tags • Are there new tags which should be created, beyond those in the current CIM tag library? If so, add them to the CIM library Build the Search-time Configs: eventtypes & tags
  • 28. • Extract "interesting" fields • If already in your CIM library, name or alias appropriately • If not already in your CIM library, name according to CIM conventions • Add lookups for missing/desirable fields • Lookups may be required to supply CIM-compliant fields/field values (for example, to convert 'sev=42' to 'severity=medium' • Make the values more readable for humans • Put everything into the TA package Build the Search-time Configs: extractions & lookups
  • 29. • Create data models. What will be interesting for end users? • Document! (Especially the fields, eventtypes & tags) • Test • Does this data drive relevant existing dashboards correctly? • Do the data models work properly / produce correct results? • Is the TA packaged properly? • Check with originating user/group; is it OK? Keep Going
  • 30. • Determine additional Splunk infrastructure required; can existing infrastructure & license support this? • Will new forwarders be required? If so, initiate CR process(es) • Will firewall changes be required? If so, initiate CR process(es) • Will new Splunk roles be required? Create & map to AD roles • Will new app contexts be required? Create app(s) as necessary • Will new users be added? Create the accounts Get Ready to Deploy
  • 31. • Deploy new search heads & indexers as needed • Install new forwarders as needed • Deploy new app & TA to search heads & indexers • Deploy new TA to relevant forwarders Bring it!
  • 32. • All sources reporting? • Event breaking, timestamp, timezone, host, source, sourcetype? • Field extractions, aliases, lookups? • Eventtypes, tags? • Data model(s)? • User access? • Confirm with original requesting user/group: looks OK? Test & Validate
  • 33. Done!
  • 34. • Bring new data sources in correctly the first time • Reduce the amount of “bad” data in your indexes– and the time spent dealing with it • Make the new data immediately useful to ALL users– not just the ones who originally requested it • Allow the data to drive all sorts of dashboards without extra modifications Gee, This Seems Like a Lot of Work…
  • 35. • http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Datapipelin e • http://wiki.splunk.com/Community:HowIndexingWorks • http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings • http://docs.splunk.com/Documentation/CIM/latest/User/Overview • http://docs.splunk.com/Documentation/CIM/latest/User/Alerts • http://splunk-base.splunk.com/apps/29008/sos-splunk-on-splunk Reference
  • 36. Copyright © 2014 Splunk Inc. Thank You! David Millis [email protected]

Editor's Notes

  • #3: Splunk safe harbor statement.
  • #5: This section should take ~10 minutes
  • #18: This section should take ~10 minutes
  • #27: This section should take ~10 minutes
  • #34: This section should take ~10 minutes