InfluxDB Internals

5 likes2,281 views

The document discusses the internals of InfluxDB, focusing on its design as a time series database. It highlights the unique requirements of time series data, including high throughput ingestion, fast queries, and special storage needs. Key advancements in InfluxDB, such as TSI for cardinality management and IFQL for improved query execution, are also outlined.

Internet

InﬂuxDB Internals
Platform Engineering Team 
@ryanbetts / ryan@inﬂuxdata.com

How great are databases?
• I like making things with
smart, clever, kind
people.

• I’ve been working on
high-throughput, realtime
data for the last 10 years.

• What’s so special about time series

• Time series database designs

• InﬂuxDB internals

RDBMS NoSQL TSDB
Correctness ACID BASE BASE
Schema DDL
DDL /
documents
on-write
Writing data DML POST/PUT line protocol
Reading data SQL GET + ﬁlter
ﬁlter, window,
group, join

TSDB unique combination
• Ingest: thousands to millions of points per second

• Store: fast accumulating, append-mostly data, lots of
repetition, often with time-to-live

• Query: analytic queries with fast ﬁltering, windowing

• Scale: availability, storage, query

Facebook Gorilla
• TTL eviction

• Columnar compression

• Write availability > query
correctness

• Metric-based schema

• Separate query processing
from access-path

Druid
• Roll-up at ingest

• Columnar storage &
time-based segments

• Indexes on dimension for
fast ﬁltering

• Separation of real time
and historical data nodes

Bullet Journals
• Fast event
recording

• Ordered by time

• Indexed by
dimensions

• Weekly / Monthly
roll-up

InﬂuxDB
1.Write Path

2.Storage

3.Query Path

4.Clustering

InﬂuxDB: Adding data (1)
POST ’http://localhost:8086/write?db=mydb' --data-
binary 'cpu_load_short,host=server01,region=us-west
value=0.64 1434055562000000000’

InﬂuxDB: Adding data (2)
fsync( ) batch to
WAL
Add to in-
memory cache
Snapshot
cache to TSM
Add to index

InﬂuxDB: on-disk (ﬁlesystem)
CREATE RETENTION POLICY <retention_policy_name> ON
<database_name> DURATION <duration> REPLICATION <n> [SHARD
DURATION <duration>] [DEFAULT]
Database directory /db
Retention Policy directory /db/rp
Shard Group (time bounded) (Logical)
Shard directory (db/rp/Id#)
TSM0001.tsm (data file)
TSM0002.tsm (data file)

InﬂuxDB: Adding data (DB)
fsync( ) batch to
WAL
Add to in-
memory cache
Snapshot to
TSM
Add to index

InﬂuxDB: Adding data (index)
• Measurement name -> ﬁeld keys

• Measurement name -> series

• Measurement name -> tag keys -> tag value -> series

• Series -> shards

• (Also sketches of series and measurements for fast
cardinality estimation)

InﬂuxDB: TSI
• Roaring-bitmaps to short-
cut series creation on insert

• Iterators for index
mappings

• Index is per-shard; series id
ﬁle is per-database

• Partitioned for lock-splitting

InﬂuxDB: InﬂuxQL Queries
1. Parses time range and expressions for ﬁltering data
2. Look-up shards to access using the list of
measurements and the time frame
3. Create the iterators for each shard
4. Merge the shard iterator outputs
select user, system from cpu
where time > now() - 1h and host = 'serverA

InﬂuxQL: Query with IFQL
1. Stand-alone `ifqld` coordinator nodes

2. Streaming storage iterators that support rate-limits

3. Separation of query planning and query distribution

4. Extensible, functional language

5. Uniﬁcation of InﬂuxQL and TICKScript

A brief sidebar on
append-mostly
databases
No one tells you about:
* Wrong data

* Old (back-ﬁlled) data

InﬂuxDB Clustering
• Strongly consistent meta-cluster (based on RAFT)

• User conﬁgured replication factor

• Replication and shard aware query planner

• Hinted-Handoﬀ queues on each data node

• (WIP) Anti-entropy consistency repair

Conclusions
• Time series data has unique storage and query
requirements that impact database design. 
• Evolution of InﬂuxDB:

1. TSI: remove the in-memory size limit on cardinality
2. IFQL: faster feature velocity; safer execution.

3. Anti-entropy repair: easier, more robust scale-out.

More Related Content

What's hot (20)

PDF

The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks

PDF

Latency and Consistency Tradeoffs in Modern Distributed DatabasesScyllaDB

PDF

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB

PDF

Presto at TwitterBill Graham

PDF

Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB

PDF

DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs

PDF

Clickhouse at Cloudflare. By Marek VavrusaValery Tkachenko

PDF

Renegotiating the boundary between database latency and consistencyScyllaDB

PDF

Introduction to InfluxDB and TICK StackAhmed AbouZaid

PPTX

How Scylla Manager Handles BackupsScyllaDB

PPTX

RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedis Labs

PDF

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB

PPTX

How Incremental Compaction Reduces Your Storage FootprintScyllaDB

PPTX

DynomiteDB - No spof High-availability Redis cluster solutionLeandro Totino Pereira

PPTX

Understanding Storage I/O Under LoadScyllaDB

PDF

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd

PDF

Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudDatabricks

PPTX

FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB

PDF

Realtime Indexing for Fast Queries on Massive Semi-Structured DataScyllaDB

PDF

Avoiding Data Hotspots at ScaleScyllaDB

The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks

Latency and Consistency Tradeoffs in Modern Distributed DatabasesScyllaDB

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB

Presto at TwitterBill Graham

Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB

DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs

Clickhouse at Cloudflare. By Marek VavrusaValery Tkachenko

Renegotiating the boundary between database latency and consistencyScyllaDB

Introduction to InfluxDB and TICK StackAhmed AbouZaid

How Scylla Manager Handles BackupsScyllaDB

RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedis Labs

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB

How Incremental Compaction Reduces Your Storage FootprintScyllaDB

DynomiteDB - No spof High-availability Redis cluster solutionLeandro Totino Pereira

Understanding Storage I/O Under LoadScyllaDB

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd

Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudDatabricks

FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB

Realtime Indexing for Fast Queries on Massive Semi-Structured DataScyllaDB

Avoiding Data Hotspots at ScaleScyllaDB

Similar to InfluxDB Internals (20)

PDF

Intro to Time Series InfluxData

PPTX

Why You Should NOT Be Using an RDBS for Time-stamped DataDevOps.com

PPTX

Why You Should NOT Be Using an RDBMS for Time-stamped DataDevOps.com

PDF

InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxData

PDF

Time seriesdb influxMauro Rainis

PDF

Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022InfluxData

PDF

Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...InfluxData

PDF

Time Series Databases for IoT (On-premises and Azure)Ivo Andreev

PDF

Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev

PDF

Introduction to influx dbRoberto Gaudenzi

PDF

Virtual training intro to InfluxDB - June 2021InfluxData

PDF

Intro to InfluxDBInfluxData

PDF

Brian Gilmore [InfluxData] | InfluxDB Storage Overview | InfluxDays 2022InfluxData

PDF

How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...DevOps.com

PPTX

Influx data basicСергій Саварин

PDF

TickVincenzo Ferrari

PDF

Devoxx france 2015 influx dbNicolas Muller

PDF

Devoxx france 2015 influxdbNicolas Muller

PDF

Time series database, InfluxDB & PHPCorley S.r.l.

PDF

Inside the InfluxDB storage engineInfluxData

Intro to Time Series InfluxData

Why You Should NOT Be Using an RDBS for Time-stamped DataDevOps.com

Why You Should NOT Be Using an RDBMS for Time-stamped DataDevOps.com

InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxData

Time seriesdb influxMauro Rainis

Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022InfluxData

Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...InfluxData

Time Series Databases for IoT (On-premises and Azure)Ivo Andreev

Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev

Introduction to influx dbRoberto Gaudenzi

Virtual training intro to InfluxDB - June 2021InfluxData

Intro to InfluxDBInfluxData

Brian Gilmore [InfluxData] | InfluxDB Storage Overview | InfluxDays 2022InfluxData

How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...DevOps.com

Influx data basicСергій Саварин

TickVincenzo Ferrari

Devoxx france 2015 influx dbNicolas Muller

Devoxx france 2015 influxdbNicolas Muller

Time series database, InfluxDB & PHPCorley S.r.l.

Inside the InfluxDB storage engineInfluxData

More from InfluxData (20)

PPTX

Announcing InfluxDB ClusteredInfluxData

PDF

Best Practices for Leveraging the Apache Arrow EcosystemInfluxData

PDF

How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData

PDF

Power Your Predictive Analytics with InfluxDBInfluxData

PDF

How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData

PDF

Build an Edge-to-Cloud Solution with the MING StackInfluxData

PDF

Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData

PDF

Introducing InfluxDB Cloud DedicatedInfluxData

PDF

Gain Better Observability with OpenTelemetry and InfluxDB InfluxData

PPTX

How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData

PDF

How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData

PPTX

Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData

PDF

Start Automating InfluxDB Deployments at the Edge with balena InfluxData

PDF

Understanding InfluxDB’s New Storage EngineInfluxData

PDF

Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData

PPTX

Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData

PDF

Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData

PDF

Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022InfluxData

PDF

Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData

PDF

Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022InfluxData

Announcing InfluxDB ClusteredInfluxData

Best Practices for Leveraging the Apache Arrow EcosystemInfluxData

How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData

Power Your Predictive Analytics with InfluxDBInfluxData

How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData

Build an Edge-to-Cloud Solution with the MING StackInfluxData

Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData

Introducing InfluxDB Cloud DedicatedInfluxData

Gain Better Observability with OpenTelemetry and InfluxDB InfluxData

How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData

How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData

Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData

Start Automating InfluxDB Deployments at the Edge with balena InfluxData

Understanding InfluxDB’s New Storage EngineInfluxData

Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData

Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData

Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData

Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022InfluxData

Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData

Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022InfluxData

Recently uploaded (20)

PDF

Apple_Environmental_Progress_Report_2025.pdfyiukwong

PPTX

Optimization_Techniques_ML_Presentation.pptxfarispalayi

PDF

BRKACI-1001 - Your First 7 Days of ACI.pdffcesargonca

PPT

Agilent Optoelectronic Solutions for Mobile Applicationandreashenniger2

PPTX

sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptxtheknightme

PDF

AI_MOD_1.pdf artificial intelligence notesshreyarrce

PPTX

Lec15_Mutability Immutability-converted.pptxkhanjahanzaib1

PPTX

一比一原版(SUNY-Albany毕业证)纽约州立大学奥尔巴尼分校毕业证如何办理Taqyea

PPTX

Softuni - Psychology of entrepreneurshipKalin Karakehayov

PDF

Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AIZilliz

PPTX

L1A Season 1 Guide made by A hegy Eng Grammar fixedtoszolder91

PPTX

Orchestrating things in Angular applicationPeter Abraham

PPT

introduction to networking with basics coverageRamananMuthukrishnan

PDF

Cleaning up your RPKI invalids, presented at PacNOG 35APNIC

DOCX

Custom vs. Off-the-Shelf Banking SoftwareKristenCarter35

PPTX

PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghjbreadpaan921

PPTX

法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法Taqyea

PPTX

L1A Season 1 ENGLISH made by A hegy fixedtoszolder91

PPTX

04 Output 1 Instruments & Tools (3).pptxGEDYIONGebre

PDF

The Internet - By the numbers, presented at npNOG 11APNIC

Apple_Environmental_Progress_Report_2025.pdfyiukwong

Optimization_Techniques_ML_Presentation.pptxfarispalayi

BRKACI-1001 - Your First 7 Days of ACI.pdffcesargonca

Agilent Optoelectronic Solutions for Mobile Applicationandreashenniger2

sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptxtheknightme

AI_MOD_1.pdf artificial intelligence notesshreyarrce

Lec15_Mutability Immutability-converted.pptxkhanjahanzaib1

一比一原版(SUNY-Albany毕业证)纽约州立大学奥尔巴尼分校毕业证如何办理Taqyea

Softuni - Psychology of entrepreneurshipKalin Karakehayov

Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AIZilliz

L1A Season 1 Guide made by A hegy Eng Grammar fixedtoszolder91

Orchestrating things in Angular applicationPeter Abraham

introduction to networking with basics coverageRamananMuthukrishnan

Cleaning up your RPKI invalids, presented at PacNOG 35APNIC

Custom vs. Off-the-Shelf Banking SoftwareKristenCarter35

PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghjbreadpaan921

法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法Taqyea

L1A Season 1 ENGLISH made by A hegy fixedtoszolder91

04 Output 1 Instruments & Tools (3).pptxGEDYIONGebre

The Internet - By the numbers, presented at npNOG 11APNIC

InfluxDB Internals

1. InﬂuxDB Internals Platform Engineering Team  @ryanbetts / ryan@inﬂuxdata.com

2. How great are databases? • I like making things with smart, clever, kind people. • I’ve been working on high-throughput, realtime data for the last 10 years.

3. • What’s so special about time series • Time series database designs • InﬂuxDB internals

5. RDBMS NoSQL TSDB Correctness ACID BASE BASE Schema DDL DDL / documents on-write Writing data DML POST/PUT line protocol Reading data SQL GET + ﬁlter ﬁlter, window, group, join

6. TSDB unique combination • Ingest: thousands to millions of points per second • Store: fast accumulating, append-mostly data, lots of repetition, often with time-to-live • Query: analytic queries with fast ﬁltering, windowing • Scale: availability, storage, query

7. Facebook Gorilla • TTL eviction • Columnar compression • Write availability > query correctness • Metric-based schema • Separate query processing from access-path

8. Druid • Roll-up at ingest • Columnar storage & time-based segments • Indexes on dimension for fast ﬁltering • Separation of real time and historical data nodes

9. Bullet Journals • Fast event recording • Ordered by time • Indexed by dimensions • Weekly / Monthly roll-up

10. InﬂuxDB 1.Write Path 2.Storage 3.Query Path 4.Clustering

11. InﬂuxDB: Adding data (1) POST ’http://localhost:8086/write?db=mydb' --data- binary 'cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000’

12. InﬂuxDB: Adding data (2) fsync( ) batch to WAL Add to in- memory cache Snapshot cache to TSM Add to index

13. InﬂuxDB: on-disk (ﬁlesystem) CREATE RETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT] Database directory /db Retention Policy directory /db/rp Shard Group (time bounded) (Logical) Shard directory (db/rp/Id#) TSM0001.tsm (data file) TSM0002.tsm (data file)

14. TSM Blocks Block TSM Index

15. InﬂuxDB: Adding data (DB) fsync( ) batch to WAL Add to in- memory cache Snapshot to TSM Add to index

16. InﬂuxDB: Adding data (index) • Measurement name -> ﬁeld keys • Measurement name -> series • Measurement name -> tag keys -> tag value -> series • Series -> shards • (Also sketches of series and measurements for fast cardinality estimation)

17. InﬂuxDB: TSI • Roaring-bitmaps to short- cut series creation on insert • Iterators for index mappings • Index is per-shard; series id ﬁle is per-database • Partitioned for lock-splitting

18. TSI TSI

19. InfluxDB: InfluxQL Queries 1. Parses time range and expressions for filtering data 2. Look-up shards to access using the list of measurements and the time frame 3. Create the iterators for each shard 4. Merge the shard iterator outputs select user, system from cpu where time > now() - 1h and host = 'serverA

20. InfluxQL: Query with IFQL 1. Stand-alone ìfqld` coordinator nodes 2. Streaming storage iterators that support rate-limits 3. Separation of query planning and query distribution 4. Extensible, functional language 5. Unification of InfluxQL and TICKScript

21. A brief sidebar on append-mostly databases No one tells you about: * Wrong data * Old (back-ﬁlled) data

22. InfluxDB Clustering • Strongly consistent meta-cluster (based on RAFT) • User configured replication factor • Replication and shard aware query planner • Hinted-Handoff queues on each data node • (WIP) Anti-entropy consistency repair

23. Conclusions • Time series data has unique storage and query requirements that impact database design.  • Evolution of InﬂuxDB: 1. TSI: remove the in-memory size limit on cardinality 2. IFQL: faster feature velocity; safer execution. 3. Anti-entropy repair: easier, more robust scale-out.