SlideShare a Scribd company logo
Raphael Taylor-Davies
Observability of InfluxDB
IOx: tracing, metrics and
system tables
© 2021 InfluxData. All rights reserved.
2
Me
• Software engineer at InfluxData for ~6 months
• Lifecycle
• In-memory catalog
• APIs
• Previously worked as Platform/Infrastructure Engineer
• Working with Rust ~2 years
© 2021 InfluxData. All rights reserved.
3
This Talk
• What is exposed
• Why is it useful
• How is it implemented
© 2021 InfluxData. All rights reserved.
4
What
© 2021 InfluxData. All rights reserved.
5
What: Management API/CLI
$ influxdb_iox database list
[“company_sensors”]
$ influxdb_iox database get company_sensors
{
"name": "company_sensors",
"lifecycleRules": {
"bufferSizeSoft": "42949672960",
"bufferSizeHard": "45097156608",
…
}
…
}
© 2021 InfluxData. All rights reserved.
6
What: Management API/CLI
$ $ influxdb_iox operation list
[{
"id": 114328,
"wall_time": “844ms”,
"job": {
"CompactChunks": {
"partition": {
"db_name": "844910ece80be8bc_3c0bd4c89186ca89",
"table_name": "http_api_requests_total",
"partition_key": "2021-09-06 13:00:00"
},
"chunks": [371,372]
}
},
"status": "Success",
...
}, ...
]⏎
© 2021 InfluxData. All rights reserved.
7
What: System Tables
$ influxdb_iox sql
> use company_sensors;
company_sensors> select partition_key, storage, memory_bytes, object_store_bytes, row_count from
system.chunks where table_name='http_request' limit 5;
+---------------------+-----------------+--------------+--------------------+-----------+
| partition_key | storage | memory_bytes | object_store_bytes | row_count |
+---------------------+-----------------+--------------+--------------------+-----------+
| 2021-08-23 15:00:00 | ObjectStoreOnly | 245693 | 52881438 | 5101475 |
| 2021-08-23 15:00:00 | ObjectStoreOnly | 196707 | 49961151 | 4781242 |
| 2021-08-23 11:00:00 | ObjectStoreOnly | 237035 | 51444594 | 4907350 |
| 2021-08-23 11:00:00 | ObjectStoreOnly | 241111 | 52472345 | 5006811 |
| 2021-08-23 11:00:00 | ObjectStoreOnly | 236995 | 53087907 | 5065582 |
+---------------------+-----------------+--------------+--------------------+-----------+
© 2021 InfluxData. All rights reserved.
8
What: System Tables
$ influxdb_iox sql
> use company_sensors;
company_sensors> select table_name, sum(object_store_bytes) from system.chunks where table_name
like 'http_api%' group by table_name;
+-------------------------------------------------+-------------------------+
| table_name | SUM(object_store_bytes) |
+-------------------------------------------------+-------------------------+
| http_api_query_service_requests_total | 462729148 |
| http_api_query_service_requests_active | 32543301 |
| http_api_request_duration_seconds | 63534303368 |
| http_api_query_service_request_duration_seconds | 300583101 |
| http_api_requests_total | 7969265730 |
+-------------------------------------------------+-------------------------+
© 2021 InfluxData. All rights reserved.
9
What: Metrics
$ http :8080/metrics
# HELP jemalloc_memstats_bytes jemalloc memstats
# TYPE jemalloc_memstats_bytes gauge
jemalloc_memstats_bytes{stat="active"} 55668879360
jemalloc_memstats_bytes{stat="alloc"} 52229651864
...
# HELP http_requests_total accumulated total requests
# TYPE http_requests_total counter
http_requests_total{path="/health",status="ok"} 337369
http_requests_total{path="/metrics",status="ok"} 33742
# HELP http_request_duration_seconds distribution of request latencies
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{path="/health",status="ok",le="0.005"} 337369
http_request_duration_seconds_bucket{path="/health",status="ok",le="+Inf"} 337369
http_request_duration_seconds_sum{path="/health",status="ok"} 0.16328641400000019
http_request_duration_seconds_count{path="/health",status="ok"} 337369
...
© 2021 InfluxData. All rights reserved.
10
What: Logs
$ kubectl logs iox-0 iox
level=info msg="Chunk is getting sorted" chunk_type="MUB" chunk_ID=424 target="query::provider"
location="query/src/provider.rs:822" time=1630935055665806764
level=info msg="persisting partition as exceeds row threshold" db_name=company_sensors
partition="Partition('844910ece80be8bc_3c0bd4c89186ca89':'http_query_response_bytes':'2021-09-06
13:00:00') locked for read" persistable_row_count=5104599 target="lifecycle::policy"
location="lifecycle/src/policy.rs:369" time=1630935055673123030
level=info msg="Chunk is not yet sorted and will get sorted in build_sort_plan" chunk_type="MUB"
chunk_ID=428 target="query::provider" location="query/src/provider.rs:808"
time=1630935055688550768
...
© 2021 InfluxData. All rights reserved.
11
What: Distributed Tracing
Ingress
Auth Service
Service A Service B
Context Propagation
Ingress: “/foo”
Auth: “/auth”
Ingress: isAuth()
SQL: “...”
Ingress: foo()
Service A: “/foo”
Service A: bar()
Service B: “/bar”
Time
© 2021 InfluxData. All rights reserved.
12
What: Distributed Tracing
© 2021 InfluxData. All rights reserved.
13
Why
© 2021 InfluxData. All rights reserved.
14
Why: Scenario 1: Out Of Memory
Memory
Usage
Time
Jemalloc Usage
Chunk Usage
© 2021 InfluxData. All rights reserved.
15
Why: Scenario 1: Out Of Memory
Memory
Usage
Time
Jemalloc Usage
Chunk Usage
© 2021 InfluxData. All rights reserved.
16
Why: Scenario 2: Zombie Compaction
Number
of
Chunks
Time
Unpersisted Chunks
Persisted Chunks
Writes Stopped
~50 mins
~5 mins
© 2021 InfluxData. All rights reserved.
17
Why: Scenario 2: Zombie Compaction
$ kubectl logs iox-0 iox
level=info msg="persisting partition as exceeds row threshold"…
level=debug msg="no persistable windows or previous outstanding persist"…
level=info msg="persisting partition as exceeds row threshold"…
level=debug msg="no persistable windows or previous outstanding persist"…
level=info msg="persisting partition as exceeds row threshold"…
level=debug msg="no persistable windows or previous outstanding persist"…
…
TaskTracker { id: TaskId(48017), state: TrackerState { start_instant: Instant { tv_sec: 2580549,
tv_nsec: 345754677 }, cancel_token: CancellationToken { is_cancelled: false }, cpu_nanos:
140765855467, wall_nanos: 2786629479924, created_futures: 1, pending_futures: 0,
pending_registrations: 0, ok_futures: 1, err_futures: 0, cancelled_futures: 0, metadata:
PersistChunks { partition: PartitionAddr { db_name: "844910ece80be8bc_3c0bd4c89186ca89",
table_name: "storage_usage_bucket_cardinality", partition_key: "2021-08-26 09:00:00" }, chunks:
[2371, 2482, 2814, 2956, 2987, 3032, 3177, 3343, 3354, 3355, 3356, 3357, 3358] } }
© 2021 InfluxData. All rights reserved.
18
Why: Scenario 2: Zombie Compaction
Number
of
Chunks
Time
Unpersisted Chunks
Persisted Chunks
Writes Stopped
~50 mins
~5 mins
© 2021 InfluxData. All rights reserved.
19
Why: Scenario 3: Overlapping Timestamps
$ select table_name,
partition_key,
row_count,
(cast(min_value as bigint) - cast(previous_max_value as bigint)) / 1000000000 as
delta_seconds
from (select *,
LAG(max_value)
OVER (partition by table_name, partition_key order by table_name, partition_key,
chunk_id) as previous_max_value
from system.chunk_columns
where column_name = 'time'
and storage = 'ReadBufferAndObjectStore' and partition_key like '2021-09-07%');
© 2021 InfluxData. All rights reserved.
20
Why: Scenario 3: Overlapping Timestamps
+----------------------------------+---------------------+-----------+---------------+
| table_name | partition_key | row_count | delta_seconds |
+----------------------------------+---------------------+-----------+---------------+
| storage_usage_bucket_cardinality | 2021-09-07 06:00:00 | 118079869 | |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 8682939 | |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 6119043 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 10653376 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 5566725 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 8037661 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 6876205 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 5743187 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 23612973 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 48122758 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 66412211 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 27081195 | 10 |
| storage_usage_bucket_cardinality | 2021-09-07 08:00:00 | 5803128 | |
| storage_usage_bucket_cardinality | 2021-09-07 08:00:00 | 5803119 | 10 |
…
© 2021 InfluxData. All rights reserved.
21
Why: Scenario 4: Query Performance
● Identification
● Diagnosis
© 2021 InfluxData. All rights reserved.
22
Why: Scenario 5: Ingest Latency
● Time to Bytes Readable (TTBR)
● Trace Propagation
© 2021 InfluxData. All rights reserved.
23
How
© 2021 InfluxData. All rights reserved.
24
How: Management API/CLI
● prost (protobuf)
● tonic (gRPC)
● pbjson (protobuf <-> JSON)
● clap + structopt (CLI)
© 2021 InfluxData. All rights reserved.
25
How: System Tables
● Datafusion TableProvider
© 2021 InfluxData. All rights reserved.
26
How: Metrics
● Open Telemetry
● Custom Crate
○ Metric abstraction
○ Prometheus
© 2021 InfluxData. All rights reserved.
27
How: Logging
● tokio-tracing
© 2021 InfluxData. All rights reserved.
28
How: Distributed Tracing
● tokio-tracing
● Custom Crate
○ tower middleware
○ context propagation
○ opentelemetry-jaeger
Thank You

More Related Content

What's hot (20)

PDF
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
Andrew Lamb
 
PDF
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
PDF
A Deep Dive into Kafka Controller
confluent
 
PPTX
Microservices architecture
Daniel Foo
 
PDF
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
Altinity Ltd
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
PDF
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
PPTX
A Deep Dive into Kafka Controller
confluent
 
PDF
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
PDF
Parquet Hadoop Summit 2013
Julien Le Dem
 
PPTX
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
PDF
Distributed stream processing with Apache Kafka
confluent
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PDF
Kafka used at scale to deliver real-time notifications
Sérgio Nunes
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PDF
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
PDF
The Apache Spark File Format Ecosystem
Databricks
 
PDF
KSQL Intro
confluent
 
PPTX
Apache Arrow Flight Overview
Jacques Nadeau
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
Andrew Lamb
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
A Deep Dive into Kafka Controller
confluent
 
Microservices architecture
Daniel Foo
 
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
Altinity Ltd
 
Parquet performance tuning: the missing guide
Ryan Blue
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
A Deep Dive into Kafka Controller
confluent
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Parquet Hadoop Summit 2013
Julien Le Dem
 
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Distributed stream processing with Apache Kafka
confluent
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Kafka used at scale to deliver real-time notifications
Sérgio Nunes
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
The Apache Spark File Format Ecosystem
Databricks
 
KSQL Intro
confluent
 
Apache Arrow Flight Overview
Jacques Nadeau
 

Similar to Observability of InfluxDB IOx: Tracing, Metrics and System Tables (20)

PDF
OPTIMIZING THE TICK STACK
InfluxData
 
PPTX
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...
InfluxData
 
PPTX
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
InfluxData
 
PDF
InfluxData Platform Future and Vision
InfluxData
 
PPTX
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxData
 
PDF
Gilmore, Palani [InfluxData] | Use Case: Monitoring / Observability | InfluxD...
InfluxData
 
PPTX
Observability - the good, the bad, and the ugly
Aleksandr Tavgen
 
PDF
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays EMEA 2021
InfluxData
 
PPTX
Observability – the good, the bad, and the ugly
Timetrix
 
PPTX
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
InfluxData
 
PDF
Time Series Data with InfluxDB
Turi, Inc.
 
PPTX
High Performance Applications with MongoDB
MongoDB
 
PDF
Using Time Series for Full Observability of a SaaS Platform
DevOps.com
 
PDF
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
thumbtacktech
 
PDF
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
Anton Yazovskiy
 
PDF
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
InfluxData
 
PPTX
Paul Dix [InfluxData] | InfluxDays Keynote: Future of InfluxDB | InfluxDays N...
InfluxData
 
PDF
Timeseries - data visualization in Grafana
OCoderFest
 
PDF
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Lightbend
 
PDF
Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData
InfluxData
 
OPTIMIZING THE TICK STACK
InfluxData
 
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...
InfluxData
 
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
InfluxData
 
InfluxData Platform Future and Vision
InfluxData
 
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxData
 
Gilmore, Palani [InfluxData] | Use Case: Monitoring / Observability | InfluxD...
InfluxData
 
Observability - the good, the bad, and the ugly
Aleksandr Tavgen
 
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays EMEA 2021
InfluxData
 
Observability – the good, the bad, and the ugly
Timetrix
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
InfluxData
 
Time Series Data with InfluxDB
Turi, Inc.
 
High Performance Applications with MongoDB
MongoDB
 
Using Time Series for Full Observability of a SaaS Platform
DevOps.com
 
Polyglot Persistence in the Real World: Cassandra + S3 + MapReduce
thumbtacktech
 
Tues 115pm cassandra + s3 + hadoop = quick auditing and analytics_yazovskiy
Anton Yazovskiy
 
Creating and Using the Flux SQL Datasource | Katy Farmer | InfluxData
InfluxData
 
Paul Dix [InfluxData] | InfluxDays Keynote: Future of InfluxDB | InfluxDays N...
InfluxData
 
Timeseries - data visualization in Grafana
OCoderFest
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Lightbend
 
Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData
InfluxData
 
Ad

More from InfluxData (20)

PPTX
Announcing InfluxDB Clustered
InfluxData
 
PDF
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
PDF
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
PDF
Power Your Predictive Analytics with InfluxDB
InfluxData
 
PDF
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
PDF
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
PDF
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
PDF
Introducing InfluxDB Cloud Dedicated
InfluxData
 
PDF
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
PPTX
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
PDF
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
PPTX
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
PDF
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
PDF
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
PPTX
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
PDF
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
PDF
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
PDF
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 
PDF
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
InfluxData
 
Announcing InfluxDB Clustered
InfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
InfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
InfluxData
 
Power Your Predictive Analytics with InfluxDB
InfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
InfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
InfluxData
 
Introducing InfluxDB Cloud Dedicated
InfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
InfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
InfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
InfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
InfluxData
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
InfluxData
 
Ad

Recently uploaded (20)

PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 

Observability of InfluxDB IOx: Tracing, Metrics and System Tables

  • 1. Raphael Taylor-Davies Observability of InfluxDB IOx: tracing, metrics and system tables
  • 2. © 2021 InfluxData. All rights reserved. 2 Me • Software engineer at InfluxData for ~6 months • Lifecycle • In-memory catalog • APIs • Previously worked as Platform/Infrastructure Engineer • Working with Rust ~2 years
  • 3. © 2021 InfluxData. All rights reserved. 3 This Talk • What is exposed • Why is it useful • How is it implemented
  • 4. © 2021 InfluxData. All rights reserved. 4 What
  • 5. © 2021 InfluxData. All rights reserved. 5 What: Management API/CLI $ influxdb_iox database list [“company_sensors”] $ influxdb_iox database get company_sensors { "name": "company_sensors", "lifecycleRules": { "bufferSizeSoft": "42949672960", "bufferSizeHard": "45097156608", … } … }
  • 6. © 2021 InfluxData. All rights reserved. 6 What: Management API/CLI $ $ influxdb_iox operation list [{ "id": 114328, "wall_time": “844ms”, "job": { "CompactChunks": { "partition": { "db_name": "844910ece80be8bc_3c0bd4c89186ca89", "table_name": "http_api_requests_total", "partition_key": "2021-09-06 13:00:00" }, "chunks": [371,372] } }, "status": "Success", ... }, ... ]⏎
  • 7. © 2021 InfluxData. All rights reserved. 7 What: System Tables $ influxdb_iox sql > use company_sensors; company_sensors> select partition_key, storage, memory_bytes, object_store_bytes, row_count from system.chunks where table_name='http_request' limit 5; +---------------------+-----------------+--------------+--------------------+-----------+ | partition_key | storage | memory_bytes | object_store_bytes | row_count | +---------------------+-----------------+--------------+--------------------+-----------+ | 2021-08-23 15:00:00 | ObjectStoreOnly | 245693 | 52881438 | 5101475 | | 2021-08-23 15:00:00 | ObjectStoreOnly | 196707 | 49961151 | 4781242 | | 2021-08-23 11:00:00 | ObjectStoreOnly | 237035 | 51444594 | 4907350 | | 2021-08-23 11:00:00 | ObjectStoreOnly | 241111 | 52472345 | 5006811 | | 2021-08-23 11:00:00 | ObjectStoreOnly | 236995 | 53087907 | 5065582 | +---------------------+-----------------+--------------+--------------------+-----------+
  • 8. © 2021 InfluxData. All rights reserved. 8 What: System Tables $ influxdb_iox sql > use company_sensors; company_sensors> select table_name, sum(object_store_bytes) from system.chunks where table_name like 'http_api%' group by table_name; +-------------------------------------------------+-------------------------+ | table_name | SUM(object_store_bytes) | +-------------------------------------------------+-------------------------+ | http_api_query_service_requests_total | 462729148 | | http_api_query_service_requests_active | 32543301 | | http_api_request_duration_seconds | 63534303368 | | http_api_query_service_request_duration_seconds | 300583101 | | http_api_requests_total | 7969265730 | +-------------------------------------------------+-------------------------+
  • 9. © 2021 InfluxData. All rights reserved. 9 What: Metrics $ http :8080/metrics # HELP jemalloc_memstats_bytes jemalloc memstats # TYPE jemalloc_memstats_bytes gauge jemalloc_memstats_bytes{stat="active"} 55668879360 jemalloc_memstats_bytes{stat="alloc"} 52229651864 ... # HELP http_requests_total accumulated total requests # TYPE http_requests_total counter http_requests_total{path="/health",status="ok"} 337369 http_requests_total{path="/metrics",status="ok"} 33742 # HELP http_request_duration_seconds distribution of request latencies # TYPE http_request_duration_seconds histogram http_request_duration_seconds_bucket{path="/health",status="ok",le="0.005"} 337369 http_request_duration_seconds_bucket{path="/health",status="ok",le="+Inf"} 337369 http_request_duration_seconds_sum{path="/health",status="ok"} 0.16328641400000019 http_request_duration_seconds_count{path="/health",status="ok"} 337369 ...
  • 10. © 2021 InfluxData. All rights reserved. 10 What: Logs $ kubectl logs iox-0 iox level=info msg="Chunk is getting sorted" chunk_type="MUB" chunk_ID=424 target="query::provider" location="query/src/provider.rs:822" time=1630935055665806764 level=info msg="persisting partition as exceeds row threshold" db_name=company_sensors partition="Partition('844910ece80be8bc_3c0bd4c89186ca89':'http_query_response_bytes':'2021-09-06 13:00:00') locked for read" persistable_row_count=5104599 target="lifecycle::policy" location="lifecycle/src/policy.rs:369" time=1630935055673123030 level=info msg="Chunk is not yet sorted and will get sorted in build_sort_plan" chunk_type="MUB" chunk_ID=428 target="query::provider" location="query/src/provider.rs:808" time=1630935055688550768 ...
  • 11. © 2021 InfluxData. All rights reserved. 11 What: Distributed Tracing Ingress Auth Service Service A Service B Context Propagation Ingress: “/foo” Auth: “/auth” Ingress: isAuth() SQL: “...” Ingress: foo() Service A: “/foo” Service A: bar() Service B: “/bar” Time
  • 12. © 2021 InfluxData. All rights reserved. 12 What: Distributed Tracing
  • 13. © 2021 InfluxData. All rights reserved. 13 Why
  • 14. © 2021 InfluxData. All rights reserved. 14 Why: Scenario 1: Out Of Memory Memory Usage Time Jemalloc Usage Chunk Usage
  • 15. © 2021 InfluxData. All rights reserved. 15 Why: Scenario 1: Out Of Memory Memory Usage Time Jemalloc Usage Chunk Usage
  • 16. © 2021 InfluxData. All rights reserved. 16 Why: Scenario 2: Zombie Compaction Number of Chunks Time Unpersisted Chunks Persisted Chunks Writes Stopped ~50 mins ~5 mins
  • 17. © 2021 InfluxData. All rights reserved. 17 Why: Scenario 2: Zombie Compaction $ kubectl logs iox-0 iox level=info msg="persisting partition as exceeds row threshold"… level=debug msg="no persistable windows or previous outstanding persist"… level=info msg="persisting partition as exceeds row threshold"… level=debug msg="no persistable windows or previous outstanding persist"… level=info msg="persisting partition as exceeds row threshold"… level=debug msg="no persistable windows or previous outstanding persist"… … TaskTracker { id: TaskId(48017), state: TrackerState { start_instant: Instant { tv_sec: 2580549, tv_nsec: 345754677 }, cancel_token: CancellationToken { is_cancelled: false }, cpu_nanos: 140765855467, wall_nanos: 2786629479924, created_futures: 1, pending_futures: 0, pending_registrations: 0, ok_futures: 1, err_futures: 0, cancelled_futures: 0, metadata: PersistChunks { partition: PartitionAddr { db_name: "844910ece80be8bc_3c0bd4c89186ca89", table_name: "storage_usage_bucket_cardinality", partition_key: "2021-08-26 09:00:00" }, chunks: [2371, 2482, 2814, 2956, 2987, 3032, 3177, 3343, 3354, 3355, 3356, 3357, 3358] } }
  • 18. © 2021 InfluxData. All rights reserved. 18 Why: Scenario 2: Zombie Compaction Number of Chunks Time Unpersisted Chunks Persisted Chunks Writes Stopped ~50 mins ~5 mins
  • 19. © 2021 InfluxData. All rights reserved. 19 Why: Scenario 3: Overlapping Timestamps $ select table_name, partition_key, row_count, (cast(min_value as bigint) - cast(previous_max_value as bigint)) / 1000000000 as delta_seconds from (select *, LAG(max_value) OVER (partition by table_name, partition_key order by table_name, partition_key, chunk_id) as previous_max_value from system.chunk_columns where column_name = 'time' and storage = 'ReadBufferAndObjectStore' and partition_key like '2021-09-07%');
  • 20. © 2021 InfluxData. All rights reserved. 20 Why: Scenario 3: Overlapping Timestamps +----------------------------------+---------------------+-----------+---------------+ | table_name | partition_key | row_count | delta_seconds | +----------------------------------+---------------------+-----------+---------------+ | storage_usage_bucket_cardinality | 2021-09-07 06:00:00 | 118079869 | | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 8682939 | | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 6119043 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 10653376 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 5566725 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 8037661 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 6876205 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 5743187 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 23612973 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 48122758 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 66412211 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 07:00:00 | 27081195 | 10 | | storage_usage_bucket_cardinality | 2021-09-07 08:00:00 | 5803128 | | | storage_usage_bucket_cardinality | 2021-09-07 08:00:00 | 5803119 | 10 | …
  • 21. © 2021 InfluxData. All rights reserved. 21 Why: Scenario 4: Query Performance ● Identification ● Diagnosis
  • 22. © 2021 InfluxData. All rights reserved. 22 Why: Scenario 5: Ingest Latency ● Time to Bytes Readable (TTBR) ● Trace Propagation
  • 23. © 2021 InfluxData. All rights reserved. 23 How
  • 24. © 2021 InfluxData. All rights reserved. 24 How: Management API/CLI ● prost (protobuf) ● tonic (gRPC) ● pbjson (protobuf <-> JSON) ● clap + structopt (CLI)
  • 25. © 2021 InfluxData. All rights reserved. 25 How: System Tables ● Datafusion TableProvider
  • 26. © 2021 InfluxData. All rights reserved. 26 How: Metrics ● Open Telemetry ● Custom Crate ○ Metric abstraction ○ Prometheus
  • 27. © 2021 InfluxData. All rights reserved. 27 How: Logging ● tokio-tracing
  • 28. © 2021 InfluxData. All rights reserved. 28 How: Distributed Tracing ● tokio-tracing ● Custom Crate ○ tower middleware ○ context propagation ○ opentelemetry-jaeger