SlideShare a Scribd company logo
Tuning Elasticsearch
Indexing Pipeline
for Logs
Radu Gheorghe
Rafał Kuć
Who are we?
Radu Rafał
Logsene
The next hour
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs Logs
Logs
The tools
Logsene
2.0 SNAPSHOT8.9.01.5 RC2
Let the games begin
Logstash
Multiple inputs
Lots of filters
Several outputs
Lots of plugins
How Logstash works
input
(thread per input)
file
tcp
redis
...
filter
(multiple workers)
grok
geoip
...
elasticsearch
solr
...
output
(multiple workers)
Scaling Logstash
Logstash basic
input {
syslog {
port => 13514
}
}
output {
elasticsearch {
protocol => "http”
manage_template => false
index => "test-index”
index_type => "test-type"
}
}
Logstash basic
4K events per second
~130% CPU
utilization
299MB RAM used
Logstash basic
Logstash with mutate
output {
elasticsearch {
protocol => "http”
manage_template => false
index => "test-index”
index_type => "test-type”
flush_size => 1000
workers => 5
}
}
filter {
mutate {
remove_field => [ "severity", "facility", "priority", "@version", "timestamp", "host" ]
}
}
3 filter threads!
-w 3
Logstash with mutate
5K events per second
~250% CPU
utilization
289MB RAM used
Logstash with mutate
Logstash with grok and tcp
filter {
grok {
match => [ "message", "<%{NUMBER:priority}>%{SYSLOGTIMESTAMP:date}
%{DATA:hostname} %{DATA:tag} %{DATA:what}:%{DATA:number}:" ]
}
mutate {
remove_field => [ "message", "@version", "@timestamp", "host" ]
}
}
input {
tcp {
port => 13514
}
}
Logstash with grok and tcp
8K events per second
~310% CPU
utilization
327MB RAM used
Logstash with grok and tcp
Logstash with JSON lines
input {
tcp {
port => 13514
codec => "json_lines"
}
}
Logstash with JSON lines
8K events per second
~260% CPU
utilization
322MB RAM used
Logstash with JSON lines
Rsyslog
Very fast
Very light
How rsyslog works
im*
imfile
imtcp
imjournal
...
mm* om*
mmnormalize
mmjsonparse
...
omelasticsearch
omredis
...
Using rsyslog
Rsyslog basic
module(load="impstats"
interval="10"
resetCounters="on"
log.file="/tmp/stats")
module(load="imtcp")
module(load="omelasticsearch")
input(type="imtcp" port="13514")
action(type="omelasticsearch"
template="plain-syslog"
searchIndex="test-index"
searchType="test-type"
bulkmode="on"
action.resumeretrycount="-1"
)
template(name="plain-syslog"
type="list") {
constant(value="{")
constant(value=""@timestamp":"") property(name="timereported" dateFormat="rfc3339")
constant(value="","host":"") property(name="hostname")
constant(value="","severity":"") property(name="syslogseverity-text")
constant(value="","facility":"") property(name="syslogfacility-text")
constant(value="","syslogtag":"") property(name="syslogtag" format="json")
constant(value="","message":"") property(name="msg" format="json")
constant(value=""}")
}
*http://blog.sematext.com/2015/04/13/monitoring-rsyslogs-performance-with-imstats-and-elasticsearch
Rsyslog basic
6K events per second
~20% CPU utilization
50MB RAM used
Rsyslog basic
Rsyslog queue and workers
main_queue(
queue.size="100000" # capacity of the main queue
queue.dequeuebatchsize="5000" # process messages in batches of 5K
queue.workerthreads="4" # 4 threads for the main queue
)
action(name="send-to-es"
type="omelasticsearch"
template="plain-syslog" # use the template defined earlier
searchIndex="test-index"
searchType="test-type"
bulkmode="on" # use bulk API
action.resumeretrycount="-1" # retry indefinitely if ES is unreachable
)
Rsyslog queue and workers
25K events per
second
~100% CPU
utilization (1 core)
75MB RAM used
(queue dependent)
Rsyslog queue and workers
Rsyslog + mmnormalize
module(load="mmnormalize")
action(type="mmnormalize"
ruleBase="/opt/rsyslog_rulebase.rb"
useRawMsg="on"
)
template(name="lumberjack" type="list") {
property(name="$!all-json")
}
$ cat /opt/rsyslog_rulebase.rb
rule=:<%priority:number%>%date:date-rfc3164% %host:word% %syslogtag:word% %what:char-
to:x3a%:%number:char-to:x3a%:
Rsyslog + mmnormalize
16K events per second
~200% CPU utilization
100MB RAM used
(queue dependent)
Rsyslog + mmnormalize
Rsyslog with JSON parsing
module(load="mmjsonparse")
action(type="mmjsonparse")
Rsyslog with JSON parsing
20K events per
second
~130% CPU utilization
70MB RAM used
(queue dependent)
Rsyslog with JSON parsing
Disk-assisted queues
main_queue(
queue.filename="main_queue" # write to disk if needed
queue.maxdiskspace="5g" # when to stop writing to disk
queue.highwatermark="200000" # start spilling to disk at this size
queue.lowwatermark="100000" # stop spilling when it gets back to this size
queue.saveonshutdown="on" # write queue contents to disk on shutdown
queue.dequeueBatchSize="5000"
queue.workerthreads="4"
queue.size="10000000" # absolute max queue size
)
Elasticsearch
How Elasticsearch works
JSON bulk, single doc
transaction log
inverted index
analysis
primary
transaction log
inverted index
analysis
replica
Elasticsearch
replicate
ES horizontal scaling
Node
shard
ES horizontal scaling
Node
shard
Node
shard
ES horizontal scaling
Node
shard
Node
shard
Node
shard
ES horizontal scaling
Node
shard shard
shard shard
Node
shard shard
shard shard
Node
shard shard
shard shard
ES horizontal scaling
Node
shard shard
shard shard
replica
replica
replica
replica
Node
shard shard
shard shard
replica
replica
replica
replica
Node
shard shard
shard shard
replica
replica
replica
replica
Elasticsearch for tools tests
Nothing is
indexed
No JVM
tuning
Nothing is
stored
_source
disabled
_all
disabled
-1 refresh
30m sync
translog
size: 2g
interval: 30m
Tuning Elasticsearch
refresh_interval: 5s*
doc_values: true
store.throttle.max_bytes_per_sec: 200mb
*http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/
Tests: hardware and data
2 x EC2 c3.large instances
(2vCPU, 3.5GB RAM,
2x16GB SSD in RAID0)
vs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs
Logs Logs
Logs
Apache logs
Test requests
Filters Aggregations
filter by client IP date histogram
filter by word in user agent top 10 response codes
wildcard filter on domain # of unique IPs
top IPs per response per time
Test runs
1. Write throughput
2. Capacity of a single index
3. Capacity with time-based indices on
hot/cold setup
Write throughput (one index)
Capacity of one index (3200 EPS)
20 seconds @ 40 - 50M
Capacity of one index (400 EPS)
15 seconds @ 40 - 50M
Time-based indices: ideal shard size
smaller indices
lighter indexing
easier to isolate hot data from cold data
easier to relocate
bigger indices
less RAM
less management overhead
smaller cluster state
without indexing, equal latency when dividing
32M data into 1/2/4/8/16/32M indices
Time-based. 2 hot and 2 cold nodes
Before: 3200 After: 4800
Time-based. 2 hot and 2 cold nodes
Before:
15s
After:
5s
That's all folks!
What to remember?
log in
JSON
parallelize
when
possible
use time
based indices
use hot / cold
nodes policy
We are hiring
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with and in open – source?
We’re hiring world – wide!
http://sematext.com/about/jobs.html
Thank you!
Radu Gheorghe
@radu0gheorghe
radu.gheorghe@sematext.com
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com

More Related Content

What's hot (20)

PDF
Logstash: Get to know your logs
SmartLogic
 
PDF
LogStash in action
Manuj Aggarwal
 
PPT
'Scalable Logging and Analytics with LogStash'
Cloud Elements
 
PDF
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
PPT
ELK stack at weibo.com
琛琳 饶
 
PDF
Logstash family introduction
Owen Wu
 
PDF
Advanced troubleshooting linux performance
Forthscale
 
PDF
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Rafał Kuć
 
PDF
Logstash-Elasticsearch-Kibana
dknx01
 
ODP
Using Logstash, elasticsearch & kibana
Alejandro E Brito Monedero
 
PDF
Monitoring with Graylog - a modern approach to monitoring?
inovex GmbH
 
PPTX
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
 
PPT
Large Scale Log collection using LogStash & mongoDB
Gaurav Bhardwaj
 
PDF
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Publicis Sapient Engineering
 
PPTX
MySQL Slow Query log Monitoring using Beats & ELK
YoungHeon (Roy) Kim
 
PPT
{{more}} Kibana4
琛琳 饶
 
PPTX
Elk stack
Jilles van Gurp
 
PPTX
ELK Stack
Phuc Nguyen
 
PDF
Machine Learning in a Twitter ETL using ELK
hypto
 
PDF
The basics of fluentd
Treasure Data, Inc.
 
Logstash: Get to know your logs
SmartLogic
 
LogStash in action
Manuj Aggarwal
 
'Scalable Logging and Analytics with LogStash'
Cloud Elements
 
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
ELK stack at weibo.com
琛琳 饶
 
Logstash family introduction
Owen Wu
 
Advanced troubleshooting linux performance
Forthscale
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Rafał Kuć
 
Logstash-Elasticsearch-Kibana
dknx01
 
Using Logstash, elasticsearch & kibana
Alejandro E Brito Monedero
 
Monitoring with Graylog - a modern approach to monitoring?
inovex GmbH
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
ForgeRock
 
Large Scale Log collection using LogStash & mongoDB
Gaurav Bhardwaj
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Publicis Sapient Engineering
 
MySQL Slow Query log Monitoring using Beats & ELK
YoungHeon (Roy) Kim
 
{{more}} Kibana4
琛琳 饶
 
Elk stack
Jilles van Gurp
 
ELK Stack
Phuc Nguyen
 
Machine Learning in a Twitter ETL using ELK
hypto
 
The basics of fluentd
Treasure Data, Inc.
 

Similar to Tuning Elasticsearch Indexing Pipeline for Logs (20)

PDF
(Fios#02) 2. elk 포렌식 분석
INSIGHT FORENSIC
 
PPTX
Scaling an ELK stack at bol.com
Renzo Tomà
 
PDF
Application Logging in the 21st century - 2014.key
Tim Bunce
 
PPTX
Hadoop cluster performance profiler
Ihor Bobak
 
PDF
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Spark Summit
 
PPTX
Proving out flash storage array performance using swingbench and slob
Kapil Goyal
 
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
PPTX
Attack monitoring using ElasticSearch Logstash and Kibana
Prajal Kulkarni
 
PDF
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Lucidworks
 
PDF
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
PDF
Rails Performance
Wen-Tien Chang
 
PPTX
Scaling Massive Elasticsearch Clusters
Sematext Group, Inc.
 
PDF
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
MySQLConference
 
PDF
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
PDF
Deep learning with kafka
Nitin Kumar
 
PDF
Using apache spark for processing trillions of records each day at Datadog
Vadim Semenov
 
PDF
"How about no grep and zabbix?". ELK based alerts and metrics.
Vladimir Pavkin
 
PPTX
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
PDF
Null Bachaav - May 07 Attack Monitoring workshop.
Prajal Kulkarni
 
PDF
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
(Fios#02) 2. elk 포렌식 분석
INSIGHT FORENSIC
 
Scaling an ELK stack at bol.com
Renzo Tomà
 
Application Logging in the 21st century - 2014.key
Tim Bunce
 
Hadoop cluster performance profiler
Ihor Bobak
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Spark Summit
 
Proving out flash storage array performance using swingbench and slob
Kapil Goyal
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Attack monitoring using ElasticSearch Logstash and Kibana
Prajal Kulkarni
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Lucidworks
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
Rails Performance
Wen-Tien Chang
 
Scaling Massive Elasticsearch Clusters
Sematext Group, Inc.
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
MySQLConference
 
Tweaking performance on high-load projects
Dmitriy Dumanskiy
 
Deep learning with kafka
Nitin Kumar
 
Using apache spark for processing trillions of records each day at Datadog
Vadim Semenov
 
"How about no grep and zabbix?". ELK based alerts and metrics.
Vladimir Pavkin
 
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Null Bachaav - May 07 Attack Monitoring workshop.
Prajal Kulkarni
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
Ad

More from Sematext Group, Inc. (20)

PDF
Tweaking the Base Score: Lucene/Solr Similarities Explained
Sematext Group, Inc.
 
PDF
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
PPTX
Is observability good for your brain?
Sematext Group, Inc.
 
PDF
Introducing log analysis to your organization
Sematext Group, Inc.
 
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 
PDF
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
PDF
Monitoring and Log Management for
Sematext Group, Inc.
 
PDF
Introduction to solr
Sematext Group, Inc.
 
PDF
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Sematext Group, Inc.
 
PDF
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
PPTX
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
PDF
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
PPT
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
PDF
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
PDF
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Sematext Group, Inc.
 
PDF
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
PDF
Solr Anti Patterns
Sematext Group, Inc.
 
PDF
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Sematext Group, Inc.
 
PDF
(Elastic)search in big data
Sematext Group, Inc.
 
Tweaking the Base Score: Lucene/Solr Similarities Explained
Sematext Group, Inc.
 
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Is observability good for your brain?
Sematext Group, Inc.
 
Introducing log analysis to your organization
Sematext Group, Inc.
 
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
Monitoring and Log Management for
Sematext Group, Inc.
 
Introduction to solr
Sematext Group, Inc.
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Sematext Group, Inc.
 
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Sematext Group, Inc.
 
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
Solr Anti Patterns
Sematext Group, Inc.
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Sematext Group, Inc.
 
(Elastic)search in big data
Sematext Group, Inc.
 
Ad

Recently uploaded (20)

PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 

Tuning Elasticsearch Indexing Pipeline for Logs

Editor's Notes

  • #3: Rafał starts and passes mic to Radu
  • #4: Rafal slide – describe the talk brefily !!! Ask people how many of the audience used the tools
  • #5: Radu slide we did some tests, we’ll share configs and benchmarks – here are the versions Logstash 1.5 – the final version will be up soon Rsyslog 8.9 – the current stable (note: most distros come with 5.x or 7.x) ES is a search engine based on Apache Lucene Current version is 1.5, next major is 2.0 with lots of changes. Many related to Lucene 5.0 Not the only tools for logging, there are many other tools, both open source and commercial, that can receive logs, parse them, buffer them and index them
  • #6: Rafal slide
  • #7: Rafal slide * Ask how many people know about Logstash
  • #8: Rafal slide
  • #9: Rafal slide
  • #10: Radu Assume we want to centralize syslog Forward syslog via TCP/UDP on a port to Logstash On the Logstash side, you can use the TCP input to listen to that port and parse syslog messages You’d use the ES output to forward to ES you can use a Java binary, but HTTP is better Logstash comes with a template for ES index, but for perf tests we’ll use our own Specify where (index,type – like a DB and a table)
  • #11: Radu - 1.3 CPUs
  • #12: Radu – segue to tuning, pass the mic
  • #13: Rafal Flush size – 1000 lowered from default 5000
  • #14: Rafal
  • #15: Rafal
  • #16: Rafal Syslog is just TCP + Grok We changed that and we are not parsing the syslog format exactly – we wanted to parse additional things and wanted to show how to parse unstructured data
  • #21: The bound was: - hardware (high CPU usage) - JSON lines codec is not parallelized, while GROK is - But if you want to do your homework you can do another run with JSON filter instead of codec and that will give the possibility of parallelization
  • #22: Radu Many people hate it, maybe because of docs I like it because it’s light and fast and has surprisingly rich functionality
  • #23: Like Logstash, it’s modular, you can use inputs to get data in, message modifiers to parse data and outputs to pass it on The flow of data is a bit different Inputs may have multiple threads, and they write to a main queue On the main queue, worker threads can do filtering, format messages using templates (will talk later) and run actions (parsing/output) You can have action queues as well, with their own threads => async You can have rulesets, which let you separate flows of input – parse – output (e.g. One ruleset for local logs, one for remote logs)
  • #24: Typical setup is to have it on each server, push to ES directly, buffer if necessary
  • #25: Load modules Impstats is for monitoring, then tcp and ES Start the tcp listener Template – how the JSON that we send to ES will look like Action – send to ES, using the template, specify index/type, use bulks, retry on failure
  • #28: Bigger memory buffer Increase bulk size Moar worker threads
  • #29: Not using more because ES is using the rest – Rafal will talk about that in a bit RAM has increased because of the queue size
  • #30: Clear win
  • #31: But not really apples for apples, because rsyslog has dedicated syslog parsers Still, not only for syslog, can parse unstructured data via mmnormalize Refer to a rulebase, which looks much like grok patterns, with two differences: Normally, patterns like number or date aren’t regexes but specific parsers. Faster but less flexible. The one above is equivalent to the Logstash grok seen earlier Builds a parse tree on startup, helps with speed if you have many rules
  • #32: Radu
  • #35: More throughput with less CPU usage
  • #37: Before moving on, one more thing: in production you probably want to use disk assisted queues instead of in-memory queues like the ones we had here. DA queue is in-memory queue that can spill to disk. Specify that via file name and give it a threshold Spilling is smart: Normally in memory When it reaches high watermark it starts writing to disk, but it does so in batches, so resumes to memory when lowwatermark Side-benefit: can save and reload memory queue contents when restarting rsyslog
  • #38: Rafal
  • #39: Rafał Index a document It goes to ES first to transaction log next to inverted index It is replicated on transaction log level
  • #40: Rafał
  • #41: Rafał
  • #42: Rafał
  • #43: Rafał
  • #44: Rafał
  • #45: Rafał
  • #46: Rafał Throttling – the default is 20, we are using 200, so we are actually going for 10 times more (we are usind SSD drives here)
  • #47: Rafał
  • #48: Rafał Cheaper filters and aggregations are on top The more expensive are at the bottom
  • #49: Radu Index as fast as we can How much data we can put in a single index at a decent indexing rate before searches took too long a good practice is to have time-based indices (e.g. Keep logs for a week, have one per day). We want to benchmark that + separating indexing load from search load by putting today’s index on different nodes than the „old” ones
  • #50: Rafal Rate slowly goes down, because merges happen and because the index is slowly getting bigger
  • #51: Rafał 40-50 m @ 20 seconds Most expensive query takes 20 sec on average Filters (quick ones) takes subseconds Some aggs takes up to 5 seconds on average
  • #52: Rafał Spikes because of merges, the big spike is because the merge happen and after the merge the queries are actually faster Most expensive queries take 15 seconds
  • #53: Radu Want to benchmark TB indices. Because: Indexing is better because of merging Searching recent data is better because idx is smaller Deleting entired indices is better But what granularity? Use-cases for small (high indexing, small retention, CPU contraint) vs big (low idx, high retention, mem constraint) granularity doesn’t affect cold search perf
  • #54: Rafal Tell about hot and cold setup The drop is because cold nodes were full
  • #55: Rafal
  • #57: Radu