Efficient and Fast Time Series Storage - The missing link in dynamic software analysis

Fast and efficient operational time series storage:
The missing link in dynamic software analysis
Symposium on Software Performance
Munich, 05.11.2015
Florian Lautenschlager, Andreas Kumlehn, Josef Adersberger,
Michael Philippsen
Design for Diagnosability
This research was in part funded by
Bavarian Ministry of Economic Affairs
and Media, Energy and Technology.

What is operational data?
2
■ Typical operational data are runtime metrics,
e.g. CPU load, memory consumption, logs, exceptions, etc.
■ Operational data is best represented as time series.
■ Continuously harvested along a multitude of dimensions.
■ Expected wide range of the values along each of the dimensions.
■ Frequencies of time spans tend to vary a lot.

3
“…interactive response times often make a qualitative
difference in data exploration, monitoring, online customer
support, rapid prototyping, debugging of data pipelines,
and other tasks.” [ Dremel: Interactive Analysis of Web-Scale Datasets, Sergey Melnik et al. ]

A typical toolchain for dynamic software analysis: collection
framework, time series storage, time series analysis framework
4
WRITE READ
Metrics
Kieker
collectD
Logstash
EKG Collector EKG Client
Kibana
Twitter - R
ETSY
EGADS
Graphite InfluxDB
OpenTSDB Chronix
Direct

Research Question:
Is it possible to exploit the characteristic features of operational
data to create a time series database that requires less space
and provides faster queries?
5
Chronix
Fast queriesEfficient storage
Extendable with analysis functions
Store every kind of operational data as time series
Scalable and portable

Yes. Chronix’ architecture enables both efficient storage of time
series and millisecond range queries.
6
(1)
Semantic Compression
(2)
Attributes and Chunks
(3)
Basic Compression
(4)
Multi-Dimensional
Storage
Record
data:<chunk>
attributes
Record
data:compressed
<chunk>
attributes
Record Storage
1 Mio. Points
100 Chunks *
10.000 Points

The key data type of Chronix is called a record.
It stores a compressed chunk of the time series and its
attributes.
7
record{
data:compressed{<chunk>}
//technical fields
id: 3dce1de0−...−93fb2e806d19
version: 1501692859622883300
start: 1427457011238
end: 1427471159292
//optional attributes
host: prodI5
process: scheduler
group: jmx
metric: heapMemory.Usage.Used
max: 896.571
}
Data:compressed{<chunk of time series data>}
■ Time Series: time stamp, numeric value
■ Traces: calls, exceptions, …
■ Logs: access, method runtimes
■ Complex data: models, test coverage,
anything else…
Optional attributes
■ Arbitrary attributes for the time series
■ Attributes are indexed
■ Make the chunk searchable
■ Can contain pre-calculated values

Chronix also provides aggregations and higher-level time series
analyses in its query language that other TSDBs do not.
8
Aggregations (ag)
■ Maximum
■ Minimum
■ Average
■ Standard Deviation
■ Percentile
Analyses (detect)
■ A trend analysis based on a linear
regression model.
■ An outlier analysis using the IQR.
■ A frequency analysis validating the
occurrence within a defined time range.
q=host:* AND -group:(jmx OR .net) & fq={!ANALYZE detect=frequency=10:6}
q=host:prod? AND group:(jmx OR .net) & fq={!ANALYZE ag=dev}

Benchmarks represent typical use cases in time series analysis.
The queries are collected from real-world analyses.
9
■ We have collected, arranged, and counted queries of real analyses.
■ Three real-world project’s operational time series data (14,195 time series, 512 Mio. points).
■Project 1: Web application for searching car information (8 web server, 20 search server)
■Project 2: Retail application for orders, billing, and customer relations (2 servers, 1 central database)
■Project 3: Sales application of a car manufacturer (2 servers, 1 central database)
Time Range (Days) #Queries
1 30
7 30
14 10
91 2
We repeat the 72
queries 20 times to
stabilize results.

Chronix outperforms related TSDBs in write throughput, storage
efficiency, and access times.
10

11

12

Chronix is open-source. Check http://www.chronix.io/ or @ChronixDB
13

14
Chronix is currently more a proof-of-concept than production-
ready. Work is going on!
Contact: florian.lautenschlager@qaware.de

Efficient and Fast Time Series Storage - The missing link in dynamic software analysis

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to Efficient and Fast Time Series Storage - The missing link in dynamic software analysis (20)

Recently uploaded (20)

Efficient and Fast Time Series Storage - The missing link in dynamic software analysis