Sam Dillard, Senior Sales Engineer
Optimizing InfluxDB
Performance
Agenda
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
© 2020 InfluxData. All rights reserved. 3
Resource Utilization
• No Specific OS Tuning Required
• IOPS IOPS IOPS
• 70% cpu/mem utilization - need head room for:
• Peak periods
• Compactions
• Backfilling data
Agenda
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
© 2020 InfluxData. All rights reserved. 5
© 2020 InfluxData. All rights reserved. 6
Telegraf
• Lightweight; written in Go
• Plug-in driven
• Optimized for writing to InfluxDB
• Formatting
• Retries
• Modifiable batch sizes and jitter
• Tag sorting
• Preprocessing
• Converting tags to fields, fields to tags
• Regex transformations
• Renaming measurements, tags
• Aggregations (mean, min, max, count, variance, stddev, etc.)
Popular Plugins
Out-of-the-box Custom
Kubernetes (kubelet) HTTP/socket listener
Kube_inventory (apiserver) HTTP (formatted endpoints)
Kafka (consumer) Prometheus (/metrics)
SNMP Exec
AMQP (mq metadata) StatsD
Redis
Nginx
HAproxy
Jolokia2
Telegraf
CPU
Mem
Disk
Docker
Kubernetes
/metrics
Kafka
MySQL
Process
-transform
-decorate
-filter
Aggregate
-mean
-min,max
-count
-variance
-stddev
File
InfluxDB
Kafka
CloudWatch
CloudWatch
Parsing
● JSON
● CSV
● Graphite
● CollectD
● Dropwizard
● Form URL-encoded
● Grok
Telegraf
InfluxDB
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Message Queue Telegraf
Kafka
Rabbit
Active
NSQ
AWS Kinesis
Google PubSub
MQTT
© 2017 InfluxData. All rights reserved.11
Telegraf
InfluxDB
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Balanced ingestion
helps....
Good...
Not so good...
Good...
Not so good...
Agenda
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
© 2018 InfluxData. All rights reserved.16
Schema Design Goals
• By reducing...
– Measurement/tag cardinality
– Information-encoding
– Key lengths
• You increase…
– Write performance
– Query performance
– Readability
© 2018 InfluxData. All rights reserved.17
“It’s a feature, not a bug...but
features require thinking”
- Richard Laskey, Wayfair
© 2018 InfluxData. All rights reserved.18
Line Protocol && Schema Insight
<measurement,tagset fieldset timestamp>
● A Measurement is a namespace for like metrics (SQL table)
● What to make a Measurement?
○ Logically-alike metrics; categorization
○ I.e., CPU has metrics has many metrics associated with it
○ I.e., Transactions
■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else…
● What to make a Tag?
○ Metadata; “things you need to `GROUP BY`”
● What to make a Field?
○ Actual metrics
■ Metrics you will visualize or operate on
○ Things that have high value variance...that you don’t need to group
© 2018 InfluxData. All rights reserved.19
Line Protocol Goals
1) Don’t encode data into Measurements or Tags; indicated by
valuesless key names (value, counter, gauge)
2) Write as many Fields per Line as you can; #1 allows for #2
3) Separate information into primitives; reduce regex grouping
4) Order Tags lexicographically
(Telegraf does all this for you, for the most part)
© 2018 InfluxData. All rights reserved.20
DON'T ENCODE DATA INTO THE MEASUREMENT NAME
Measurement names like:
Encode that information as tags:
Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000
cpu.server-6.us-west.usage_user value=40.0 1444234982000000000
mem.server-6.us-west.free value=25.0 1444234982000000000
cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000
cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000
mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
© 2018 InfluxData. All rights reserved.21
DON’T OVERLOAD TAGS (separate into primitives)
BAD
GOOD: Separate out into different tags:
xxx
cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000
cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000
cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000
cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
© 2017 InfluxData. All rights reserved.22
Use Telegraf as a Graphite parser
Graphite like: cpu.usage.eu-west.idle.percentage 100
With a Telegraf configuration like:
Results in following transformation:
cpu_usage,region=eu-east idle_percentage=100
[[inputs.http_listener_v2]]
data_format = “graphite”
separator = "_"
templates = [
"measurement.measurement.region.field*"
]
© 2018 InfluxData. All rights reserved.23
© 2018 InfluxData. All rights reserved.24
© 2017 InfluxData. All rights reserved.25
stock_prices,symbol=BP price=25.0 1
stock_prices,symbol=CVX price=35.0 1
stock_prices,symbol=XOM price=45.0 1
© 2017 InfluxData. All rights reserved.26
stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1
stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
© 2018 InfluxData. All rights reserved.27
Also smaller payloads:
From:
cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp>
To:
cpu,region=us-west-1,host=hostA,container=containerA
usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0,
usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
Agenda
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
➢ Queries
➢ Configuration
❖ Q&A
© 2020 InfluxData. All rights reserved. 29
Query Performance
• Streaming functions > batch functions
• Batch funcs
• percentile(), spread(), stddev(), median(), mode(), holt-winters
• Stream funcs
• mean(),bottom(),first(),last(),max(),top(),count(),etc.
• Distributed functions (clusters only) > local functions
• Distributed
• first(),last(),max(),min(),count(),mean(),sum()
• Local
• percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.
© 2020 InfluxData. All rights reserved. 30
Query Performance
• Boundaries!
• Time-bounding and series-bounding with `WHERE` clause
• `SELECT *` generally not a best practice
• Agg functions instead of raw queries
• `SELECT mean(<field>)` > `SELECT <field>`
• Reduce `GROUP BY time` intervals
• Subqueries
• When appropriate, process data from an already processed subset of
data
• SELECT min("max") FROM (SELECT max("usage_user") FROM cpu WHERE time
> now() - 5d GROUP BY time(5m))
sam@influxdata.com @SDillard12
THANKS!

Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays Virtual Experience London 2020

  • 1.
    Sam Dillard, SeniorSales Engineer Optimizing InfluxDB Performance
  • 2.
  • 3.
    © 2020 InfluxData.All rights reserved. 3 Resource Utilization • No Specific OS Tuning Required • IOPS IOPS IOPS • 70% cpu/mem utilization - need head room for: • Peak periods • Compactions • Backfilling data
  • 4.
  • 5.
    © 2020 InfluxData.All rights reserved. 5
  • 6.
    © 2020 InfluxData.All rights reserved. 6 Telegraf • Lightweight; written in Go • Plug-in driven • Optimized for writing to InfluxDB • Formatting • Retries • Modifiable batch sizes and jitter • Tag sorting • Preprocessing • Converting tags to fields, fields to tags • Regex transformations • Renaming measurements, tags • Aggregations (mean, min, max, count, variance, stddev, etc.)
  • 7.
    Popular Plugins Out-of-the-box Custom Kubernetes(kubelet) HTTP/socket listener Kube_inventory (apiserver) HTTP (formatted endpoints) Kafka (consumer) Prometheus (/metrics) SNMP Exec AMQP (mq metadata) StatsD Redis Nginx HAproxy Jolokia2
  • 8.
  • 9.
    Parsing ● JSON ● CSV ●Graphite ● CollectD ● Dropwizard ● Form URL-encoded ● Grok
  • 10.
  • 11.
    © 2017 InfluxData.All rights reserved.11 Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    © 2018 InfluxData.All rights reserved.16 Schema Design Goals • By reducing... – Measurement/tag cardinality – Information-encoding – Key lengths • You increase… – Write performance – Query performance – Readability
  • 17.
    © 2018 InfluxData.All rights reserved.17 “It’s a feature, not a bug...but features require thinking” - Richard Laskey, Wayfair
  • 18.
    © 2018 InfluxData.All rights reserved.18 Line Protocol && Schema Insight <measurement,tagset fieldset timestamp> ● A Measurement is a namespace for like metrics (SQL table) ● What to make a Measurement? ○ Logically-alike metrics; categorization ○ I.e., CPU has metrics has many metrics associated with it ○ I.e., Transactions ■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else… ● What to make a Tag? ○ Metadata; “things you need to `GROUP BY`” ● What to make a Field? ○ Actual metrics ■ Metrics you will visualize or operate on ○ Things that have high value variance...that you don’t need to group
  • 19.
    © 2018 InfluxData.All rights reserved.19 Line Protocol Goals 1) Don’t encode data into Measurements or Tags; indicated by valuesless key names (value, counter, gauge) 2) Write as many Fields per Line as you can; #1 allows for #2 3) Separate information into primitives; reduce regex grouping 4) Order Tags lexicographically (Telegraf does all this for you, for the most part)
  • 20.
    © 2018 InfluxData.All rights reserved.20 DON'T ENCODE DATA INTO THE MEASUREMENT NAME Measurement names like: Encode that information as tags: Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000 cpu.server-6.us-west.usage_user value=40.0 1444234982000000000 mem.server-6.us-west.free value=25.0 1444234982000000000 cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000 cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000 mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
  • 21.
    © 2018 InfluxData.All rights reserved.21 DON’T OVERLOAD TAGS (separate into primitives) BAD GOOD: Separate out into different tags: xxx cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000 cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000 cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000 cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
  • 22.
    © 2017 InfluxData.All rights reserved.22 Use Telegraf as a Graphite parser Graphite like: cpu.usage.eu-west.idle.percentage 100 With a Telegraf configuration like: Results in following transformation: cpu_usage,region=eu-east idle_percentage=100 [[inputs.http_listener_v2]] data_format = “graphite” separator = "_" templates = [ "measurement.measurement.region.field*" ]
  • 23.
    © 2018 InfluxData.All rights reserved.23
  • 24.
    © 2018 InfluxData.All rights reserved.24
  • 25.
    © 2017 InfluxData.All rights reserved.25 stock_prices,symbol=BP price=25.0 1 stock_prices,symbol=CVX price=35.0 1 stock_prices,symbol=XOM price=45.0 1
  • 26.
    © 2017 InfluxData.All rights reserved.26 stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1 stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
  • 27.
    © 2018 InfluxData.All rights reserved.27 Also smaller payloads: From: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp> To: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0, usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
  • 28.
    Agenda ❖ Optimizing ➢ Hardware/Architecture ➢Write Method ➢ Schema ➢ Queries ➢ Configuration ❖ Q&A
  • 29.
    © 2020 InfluxData.All rights reserved. 29 Query Performance • Streaming functions > batch functions • Batch funcs • percentile(), spread(), stddev(), median(), mode(), holt-winters • Stream funcs • mean(),bottom(),first(),last(),max(),top(),count(),etc. • Distributed functions (clusters only) > local functions • Distributed • first(),last(),max(),min(),count(),mean(),sum() • Local • percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.
  • 30.
    © 2020 InfluxData.All rights reserved. 30 Query Performance • Boundaries! • Time-bounding and series-bounding with `WHERE` clause • `SELECT *` generally not a best practice • Agg functions instead of raw queries • `SELECT mean(<field>)` > `SELECT <field>` • Reduce `GROUP BY time` intervals • Subqueries • When appropriate, process data from an already processed subset of data • SELECT min("max") FROM (SELECT max("usage_user") FROM cpu WHERE time > now() - 5d GROUP BY time(5m))
  • 31.