SlideShare a Scribd company logo
Fast and efficient operational time series storage:
The missing link in dynamic software analysis
Symposium on Software Performance
Munich, 05.11.2015
Florian Lautenschlager, Andreas Kumlehn, Josef Adersberger,
Michael Philippsen
Design for Diagnosability
This research was in part funded by
Bavarian Ministry of Economic Affairs
and Media, Energy and Technology.
What is operational data?
2
■ Typical operational data are runtime metrics,
e.g. CPU load, memory consumption, logs, exceptions, etc.
■ Operational data is best represented as time series.
■ Continuously harvested along a multitude of dimensions.
■ Expected wide range of the values along each of the dimensions.
■ Frequencies of time spans tend to vary a lot.
3
“…interactive response times often make a qualitative
difference in data exploration, monitoring, online customer
support, rapid prototyping, debugging of data pipelines,
and other tasks.” [ Dremel: Interactive Analysis of Web-Scale Datasets, Sergey Melnik et al. ]
A typical toolchain for dynamic software analysis: collection
framework, time series storage, time series analysis framework
4
WRITE READ
Metrics
Kieker
collectD
Logstash
EKG Collector EKG Client
Kibana
Twitter - R
ETSY
EGADS
Graphite InfluxDB
OpenTSDB Chronix
Direct
Research Question:
Is it possible to exploit the characteristic features of operational
data to create a time series database that requires less space
and provides faster queries?
5
Chronix
Fast queriesEfficient storage
Extendable with analysis functions
Store every kind of operational data as time series
Scalable and portable
Yes. Chronix’ architecture enables both efficient storage of time
series and millisecond range queries.
6
(1)
Semantic Compression
(2)
Attributes and Chunks
(3)
Basic Compression
(4)
Multi-Dimensional
Storage
Record
data:<chunk>
attributes
Record
data:compressed
<chunk>
attributes
Record Storage
1 Mio. Points
100 Chunks *
10.000 Points
The key data type of Chronix is called a record.
It stores a compressed chunk of the time series and its
attributes.
7
record{
data:compressed{<chunk>}
//technical fields
id: 3dce1de0−...−93fb2e806d19
version: 1501692859622883300
start: 1427457011238
end: 1427471159292
//optional attributes
host: prodI5
process: scheduler
group: jmx
metric: heapMemory.Usage.Used
max: 896.571
}
Data:compressed{<chunk of time series data>}
■ Time Series: time stamp, numeric value
■ Traces: calls, exceptions, …
■ Logs: access, method runtimes
■ Complex data: models, test coverage,
anything else…
Optional attributes
■ Arbitrary attributes for the time series
■ Attributes are indexed
■ Make the chunk searchable
■ Can contain pre-calculated values
Chronix also provides aggregations and higher-level time series
analyses in its query language that other TSDBs do not.
8
Aggregations (ag)
■ Maximum
■ Minimum
■ Average
■ Standard Deviation
■ Percentile
Analyses (detect)
■ A trend analysis based on a linear
regression model.
■ An outlier analysis using the IQR.
■ A frequency analysis validating the
occurrence within a defined time range.
q=host:* AND -group:(jmx OR .net) & fq={!ANALYZE detect=frequency=10:6}
q=host:prod? AND group:(jmx OR .net) & fq={!ANALYZE ag=dev}
Benchmarks represent typical use cases in time series analysis.
The queries are collected from real-world analyses.
9
■ We have collected, arranged, and counted queries of real analyses.
■ Three real-world project’s operational time series data (14,195 time series, 512 Mio. points).
■Project 1: Web application for searching car information (8 web server, 20 search server)
■Project 2: Retail application for orders, billing, and customer relations (2 servers, 1 central database)
■Project 3: Sales application of a car manufacturer (2 servers, 1 central database)
Time Range (Days) #Queries
1 30
7 30
14 10
91 2
We repeat the 72
queries 20 times to
stabilize results.
Chronix outperforms related TSDBs in write throughput, storage
efficiency, and access times.
10
Chronix outperforms related TSDBs in write throughput, storage
efficiency, and access times.
11
Chronix outperforms related TSDBs in write throughput, storage
efficiency, and access times.
12
Chronix is open-source. Check http://www.chronix.io/ or @ChronixDB
13
14
Chronix is currently more a proof-of-concept than production-
ready. Work is going on!
Contact: florian.lautenschlager@qaware.de

More Related Content

What's hot (19)

PDF
Time Series Processing with Solr and Spark
Josef Adersberger
 
PDF
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
PDF
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
NoSQLmatters
 
PDF
openTSDB - Metrics for a distributed world
Oliver Hankeln
 
PDF
OpenTSDB: HBaseCon2017
HBaseCon
 
PPTX
Update on OpenTSDB and AsyncHBase
HBaseCon
 
PDF
OpenTSDB 2.0
HBaseCon
 
PDF
Gnocchi v3
Gordon Chung
 
PDF
JEE on DC/OS
Josef Adersberger
 
PDF
Gnocchi v4 - past and present
Gordon Chung
 
PDF
Accidental Data Analytics
APNIC
 
PDF
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
Srinath Perera
 
PDF
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
PPTX
The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...
Anis Nasir
 
PDF
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
PDF
A Deeper Dive into EXPLAIN
EDB
 
PDF
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Christopher Bradford
 
PDF
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
PDF
InfluxDB & Grafana
Pedro Salgado
 
Time Series Processing with Solr and Spark
Josef Adersberger
 
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
NoSQLmatters
 
openTSDB - Metrics for a distributed world
Oliver Hankeln
 
OpenTSDB: HBaseCon2017
HBaseCon
 
Update on OpenTSDB and AsyncHBase
HBaseCon
 
OpenTSDB 2.0
HBaseCon
 
Gnocchi v3
Gordon Chung
 
JEE on DC/OS
Josef Adersberger
 
Gnocchi v4 - past and present
Gordon Chung
 
Accidental Data Analytics
APNIC
 
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
Srinath Perera
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
The Power of Both Choices: Practical Load Balancing for Distributed Stream Pr...
Anis Nasir
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
Rob Skillington
 
A Deeper Dive into EXPLAIN
EDB
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Christopher Bradford
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
InfluxDB & Grafana
Pedro Salgado
 

Viewers also liked (20)

PDF
Chronix: A fast and efficient time series storage based on Apache Solr
Florian Lautenschlager
 
PPTX
1r ESO - Biologia i Geologia - Tema 08 - Les funcions vitals en els animals
INS Escola Intermunicipal del Penedès
 
PDF
Double irish with Dutch sandwich arrangement
sammysammysammy
 
PPTX
Як зберегти мир?
Ірпінська Біблійна Церква
 
PPTX
Ideal Business Mindset Inc Business Presentation
Felix Albutra
 
PPTX
πειραματα χημειας στην στ1 του 2ου δημοτικου σχολειου
alexkonta
 
PPTX
La lógica 10
José Zorrilla
 
DOCX
Practica de los signos de puntuacion
Elvis Asencios
 
DOCX
Masusing Banghay Aralin sa Filipino (Detailed lesson plan in Filipino) (CDSGA...
tj iglesias
 
PPT
Glosararium card teks debat , aby dan nuryahya ,luky ch xotr1 vocsten malang
Nuril anwar
 
PPTX
Contabilidad
Henry Cobo Hdez
 
PPTX
Service marketing- customer relationship management
sksbatish
 
PDF
Chronix as Long-Term Storage for Prometheus
QAware GmbH
 
PPTX
Hunting for a diagnosis
Maduka Sanjeewa
 
PPTX
3Com 3C17711 - RF
savomir
 
PDF
Aliens Space Station Brochure - Zricks.com
Zricks.com
 
DOCX
Guia de base de datos
Gaby Escobar Carmona
 
DOCX
Prueba 1 quinto lenguaje rio guejar
Secretaría de Educación Pública
 
DOCX
Prueba 1 quinto matematicas rio guejar
Secretaría de Educación Pública
 
DOCX
Prueba 1 tercero lenguaje rio guejar
Secretaría de Educación Pública
 
Chronix: A fast and efficient time series storage based on Apache Solr
Florian Lautenschlager
 
1r ESO - Biologia i Geologia - Tema 08 - Les funcions vitals en els animals
INS Escola Intermunicipal del Penedès
 
Double irish with Dutch sandwich arrangement
sammysammysammy
 
Ideal Business Mindset Inc Business Presentation
Felix Albutra
 
πειραματα χημειας στην στ1 του 2ου δημοτικου σχολειου
alexkonta
 
La lógica 10
José Zorrilla
 
Practica de los signos de puntuacion
Elvis Asencios
 
Masusing Banghay Aralin sa Filipino (Detailed lesson plan in Filipino) (CDSGA...
tj iglesias
 
Glosararium card teks debat , aby dan nuryahya ,luky ch xotr1 vocsten malang
Nuril anwar
 
Contabilidad
Henry Cobo Hdez
 
Service marketing- customer relationship management
sksbatish
 
Chronix as Long-Term Storage for Prometheus
QAware GmbH
 
Hunting for a diagnosis
Maduka Sanjeewa
 
3Com 3C17711 - RF
savomir
 
Aliens Space Station Brochure - Zricks.com
Zricks.com
 
Guia de base de datos
Gaby Escobar Carmona
 
Prueba 1 quinto lenguaje rio guejar
Secretaría de Educación Pública
 
Prueba 1 quinto matematicas rio guejar
Secretaría de Educación Pública
 
Prueba 1 tercero lenguaje rio guejar
Secretaría de Educación Pública
 
Ad

Similar to Efficient and Fast Time Series Storage - The missing link in dynamic software analysis (20)

PDF
Intro to Time Series
InfluxData
 
PDF
Chronix Time Series Database - The New Time Series Kid on the Block
QAware GmbH
 
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
PDF
Real time intrusion detection in network traffic using adaptive and auto-scal...
Gobinath Loganathan
 
PDF
Scalable and Cost-Effective Model-Based Software Verification and Testing
Lionel Briand
 
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
PDF
OSMC 2016 - Friends and foes by Heinrich Hartmann
NETWAYS
 
PDF
OSMC 2016 | Friends and foes in API Monitoring by Heinrich Hartmann
NETWAYS
 
PPTX
BsidesLVPresso2016_JZeditsv6
Rod Soto
 
PDF
Digital Document Preservation Simulation - Boston Python User's Group
Micah Altman
 
PPTX
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PPT
Instrumentation and measurement
Dr.M.Prasad Naidu
 
PDF
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 
PDF
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
PPTX
Real time streaming analytics
Anirudh
 
PDF
Infrastructure monitoring made easy, from ingest to insight
Elasticsearch
 
PDF
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
PPTX
Performance analysis and troubleshooting using DTrace
Graeme Jenkinson
 
PPTX
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
Intro to Time Series
InfluxData
 
Chronix Time Series Database - The New Time Series Kid on the Block
QAware GmbH
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
Real time intrusion detection in network traffic using adaptive and auto-scal...
Gobinath Loganathan
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Lionel Briand
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Lucidworks
 
OSMC 2016 - Friends and foes by Heinrich Hartmann
NETWAYS
 
OSMC 2016 | Friends and foes in API Monitoring by Heinrich Hartmann
NETWAYS
 
BsidesLVPresso2016_JZeditsv6
Rod Soto
 
Digital Document Preservation Simulation - Boston Python User's Group
Micah Altman
 
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Instrumentation and measurement
Dr.M.Prasad Naidu
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Real time streaming analytics
Anirudh
 
Infrastructure monitoring made easy, from ingest to insight
Elasticsearch
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Performance analysis and troubleshooting using DTrace
Graeme Jenkinson
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
Ad

Recently uploaded (20)

PDF
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPT
Performance Review for Security and Commodity.ppt
chatwithnitin
 
PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
List of all the AI prompt cheat codes.pdf
Avijit Kumar Roy
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
deep dive data management sharepoint apps.ppt
novaprofk
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
Performance Review for Security and Commodity.ppt
chatwithnitin
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Data base management system Transactions.ppt
gandhamcharan2006
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 

Efficient and Fast Time Series Storage - The missing link in dynamic software analysis

  • 1. Fast and efficient operational time series storage: The missing link in dynamic software analysis Symposium on Software Performance Munich, 05.11.2015 Florian Lautenschlager, Andreas Kumlehn, Josef Adersberger, Michael Philippsen Design for Diagnosability This research was in part funded by Bavarian Ministry of Economic Affairs and Media, Energy and Technology.
  • 2. What is operational data? 2 ■ Typical operational data are runtime metrics, e.g. CPU load, memory consumption, logs, exceptions, etc. ■ Operational data is best represented as time series. ■ Continuously harvested along a multitude of dimensions. ■ Expected wide range of the values along each of the dimensions. ■ Frequencies of time spans tend to vary a lot.
  • 3. 3 “…interactive response times often make a qualitative difference in data exploration, monitoring, online customer support, rapid prototyping, debugging of data pipelines, and other tasks.” [ Dremel: Interactive Analysis of Web-Scale Datasets, Sergey Melnik et al. ]
  • 4. A typical toolchain for dynamic software analysis: collection framework, time series storage, time series analysis framework 4 WRITE READ Metrics Kieker collectD Logstash EKG Collector EKG Client Kibana Twitter - R ETSY EGADS Graphite InfluxDB OpenTSDB Chronix Direct
  • 5. Research Question: Is it possible to exploit the characteristic features of operational data to create a time series database that requires less space and provides faster queries? 5 Chronix Fast queriesEfficient storage Extendable with analysis functions Store every kind of operational data as time series Scalable and portable
  • 6. Yes. Chronix’ architecture enables both efficient storage of time series and millisecond range queries. 6 (1) Semantic Compression (2) Attributes and Chunks (3) Basic Compression (4) Multi-Dimensional Storage Record data:<chunk> attributes Record data:compressed <chunk> attributes Record Storage 1 Mio. Points 100 Chunks * 10.000 Points
  • 7. The key data type of Chronix is called a record. It stores a compressed chunk of the time series and its attributes. 7 record{ data:compressed{<chunk>} //technical fields id: 3dce1de0−...−93fb2e806d19 version: 1501692859622883300 start: 1427457011238 end: 1427471159292 //optional attributes host: prodI5 process: scheduler group: jmx metric: heapMemory.Usage.Used max: 896.571 } Data:compressed{<chunk of time series data>} ■ Time Series: time stamp, numeric value ■ Traces: calls, exceptions, … ■ Logs: access, method runtimes ■ Complex data: models, test coverage, anything else… Optional attributes ■ Arbitrary attributes for the time series ■ Attributes are indexed ■ Make the chunk searchable ■ Can contain pre-calculated values
  • 8. Chronix also provides aggregations and higher-level time series analyses in its query language that other TSDBs do not. 8 Aggregations (ag) ■ Maximum ■ Minimum ■ Average ■ Standard Deviation ■ Percentile Analyses (detect) ■ A trend analysis based on a linear regression model. ■ An outlier analysis using the IQR. ■ A frequency analysis validating the occurrence within a defined time range. q=host:* AND -group:(jmx OR .net) & fq={!ANALYZE detect=frequency=10:6} q=host:prod? AND group:(jmx OR .net) & fq={!ANALYZE ag=dev}
  • 9. Benchmarks represent typical use cases in time series analysis. The queries are collected from real-world analyses. 9 ■ We have collected, arranged, and counted queries of real analyses. ■ Three real-world project’s operational time series data (14,195 time series, 512 Mio. points). ■Project 1: Web application for searching car information (8 web server, 20 search server) ■Project 2: Retail application for orders, billing, and customer relations (2 servers, 1 central database) ■Project 3: Sales application of a car manufacturer (2 servers, 1 central database) Time Range (Days) #Queries 1 30 7 30 14 10 91 2 We repeat the 72 queries 20 times to stabilize results.
  • 10. Chronix outperforms related TSDBs in write throughput, storage efficiency, and access times. 10
  • 11. Chronix outperforms related TSDBs in write throughput, storage efficiency, and access times. 11
  • 12. Chronix outperforms related TSDBs in write throughput, storage efficiency, and access times. 12
  • 13. Chronix is open-source. Check http://www.chronix.io/ or @ChronixDB 13
  • 14. 14 Chronix is currently more a proof-of-concept than production- ready. Work is going on! Contact: [email protected]