Agenda
● Time Series101
● Machine/Deep Learning for Time Series Data
○ Anomaly Detection
○ Forecasting
● InfluxDB 3 Processing Engine (ML in the Database)
● Hand on Coding
● Q&A
3.
Time Series Datais everywhere
Cloud, Application, Server Monitoring/Observability
4.
Time Series Datais everywhere
Finance, Manufacturing, Healthcare
BioMetrics
(Sensor Data)
5.
Time Series Data
WarehouseIoT
Devices
Ingest - A high amount of data streaming in at nano
second precision
Compression - The ability to store this large data set
without breaking the bank
Cardinality - The need to store wide rows,
timestamped data with multiple values
Querying by Time - Query and process rows by
time, values, and tags
Real time Analytics - Fast queries for real-time
analytics
Track + Monitor
system (model, app)
infrastructure
(DevOps, MLOps)
Metrics & Events
from several sources
(software,hardware,
cloud, network etc)
Robotics and Green
Energy
6.
TIME SERIES
Evolution ofDatabases
DOCUMENT SEARCH
RELATIONAL
• Events, metrics, time-stamped
• For IoT, analytics, cloud native
• Distributed
search
• Logs
• Geo
• High
throughput
• Large
document
• Orders
• Customers
• Records
Time series is fastest growing
data category by far
All others
Time series
source: DB Engines
influxdb
😥 Packing Cois having recurring issues
with one of their packaging machines.
🤖 Unexpectedly, 1 of the machines will enter
a failing state which requires a manual
reset by an engineer.
📊 The Plant Manager has advised, when
running normally all machine sensors will
follow similar output patterns. If a
machine is at fault these will fluctuate
abnormally.
🤔 How can we use help them ?
📦 Packing Co — Anomaly Detection
What do wedo when
our result becomes
unpredictable by
conventional means?
Realistically…
14.
An embedded PythonVM in InfluxDB 3
Core & Enterprise
The Processing Engine
15.
Goal
Build and improveupon the functionality of:
Continuous Queries
Kapacitor
Flux Tasks
Telegraf
This functionality is brought directly inside the
database for efficiency and ease of use.
Built for easy development and access to
Python’s ecosystem of libraries and tools.
Great for data collection, transformation,
processing, monitoring, automation and more.
InfluxDB 3 Core | The Processing Engine
An Overview
16.
Sends write datato a plugin once a second (can be configured)
Executes on a user-defined schedule; great for data collection and deadman monitoring
Binds a plugin to an HTTP endpoint where request content is sent to the plugin, which
can then parse, process, and send the data into the database or to third party services
and respond to the caller
WAL Flush
Scheduled Task
On Request
InfluxDB 3 Core | The Processing Engine
Execution Types
17.
InfluxDB 3 Core| The Processing Engine
Example Usage and Implementation
Adjusted Write Path for WAL Flush Trigger
Object
Storage
Every 10
Minutes
Every 1
Second
Incoming
Write
Write
Buffer
Queryable
Buffer
User creates a plugin, which is a Python
script stored in a defined location.
User adds and enables a trigger, which
decides when to call a specific plugin.
The trigger fires whenever the execution
method is needed (WAL flush, on request,
etc.)
1
2
3
The Approach
Processing Engine
WAL Flush Trigger
Plugin
#1
Plugin
#2
Plugin
#3
18.
GET Request
InfluxDB 3Core | The Processing Engine
On-Request Trigger
POST Request
Processing Engine
On-Request Trigger
Plugin
#1
Plugin
#2
Plugin
#3
Incoming
GET
Request
InfluxDB
Plugin API
Processing Engine
On-Request Trigger
Plugin
#1
Plugin
#2
Plugin
#3
Incoming
POST
Request
InfluxDB
Plugin API
19.
InfluxDB 3 Core| The Processing Engine
Scheduled Trigger
Scheduled Trigger Usage
Object
Storage
Query
Write
Processing Engine
Scheduled Trigger
Plugin
#1
Plugin
#2
Plugin
#3
New Use Cases
Use the Scheduled plugin for
running specific processes on a
dedicated time cycle.
Great for data collection,
downsampling, status checks,
trend analyses, report generation,
and much more.
Every
10
Seconds
External
Service
20.
Purpose
To enable persistentmemory across different
calls to the Processing Engine.
Two Types
Trigger-Specific – Cache space tied to the
specific trigger within the Processing Engine.
Shared – Cache that is shared across all
triggers within the Processing Engine.
InfluxDB 3 Core | The Processing Engine
In-Memory Cache
Processing Engine
Shared Cache
WAL Flush
Trigger
On-Request
Trigger
Scheduled
Trigger
Individual Trigger Caches
21.
Hands on Code
Pluginfor Forecasting & Anomaly
detection
Material:
https://github.com/InfluxCommunity/WeatherForecasting
22.
Analysing Time SeriesData Engineering
1. Aggregates (Mean, Median)
● Rolling Averages: Smooth data to capture
long-term trends (e.g., 7-day average).
2. Handling Missing Values
● Imputation: Fill missing data using methods like
interpolation or forward/backward fill.
● Dropping: Remove rare missing data if it doesn’t
affect analysis.
3. Time Series Database Schema
● Timestamp as Primary Key: Use timestamps to
uniquely identify data points.
● Granularity: Choose the right time intervals (e.g.,
daily, hourly).
4. Cardinality
● High Cardinality: Many unique values (e.g.,
customer IDs).
● Low Cardinality: Few unique values (e.g., status
flags).
23.
Forecasting Time SeriesData
Statistical & Classical ML Methods
1. Naive Method
● Linear Regression: Predicts future values using a straight-line trend.
● Random Walk: Assumes future values change randomly, like stock prices.
2. Exponential Smoothing
● Gives more weight to recent data and less to older data for better predictions.
3. Lagging Features
● Uses past values to predict the future, capturing time-based patterns (e.g., sales or
temperature).
4. ARIMA
● AR (Autoregressive): Predicts future values based on past values.
● MA (Moving Average): Smooths past errors to improve predictions.
5. SARIMA
● Extends ARIMA by adding seasonal patterns to handle repeating trends over time.
27
Additional TimeSeries MLLibraries
Unsupervised and rule-based
time series anomaly detection
The ADTK package allows you
to easily build an effective
detection model from a variety
or rule-based anomaly
detection methods.
“Prophet is a procedure for
forecasting time series data
based on an additive model
where non-linear trends are fit
with yearly, weekly, and daily
seasonality, plus holiday
effects.”
A Neural Network based
Time-Series model, inspired by
Facebook Prophet and AR-Net
(Autoregressive neural
network), built on PyTorch.
“A new powerful open source
library to perform time series
forecasting etc using LLM by
Amazon Research”
Anomaly Detection Toolkit
(ADTK)
FB Prophet Neural Prophet Chronos
28.
Conclusion
Learning Resources
● WeatherForecasting Workshop: https://github.com/InfluxCommunity/WeatherForecasting
● Prophet Forecasting:
https://github.com/Anaisdg/influxdb3_plugins/tree/add-fbprophet-plugins/influxdata/Anaisdg/fbprophet
● River ML (Online timeseries machine learning library): https://github.com/online-ml/river
● Deep Learning Time Series Benchmark: https://huggingface.co/spaces/Salesforce/GIFT-Eval
● Book (Machine Learning Engineering): https://mlip-cmu.github.io/book/index.html
● Video tutorials on YouTube, InfluxDB University (free training): https://influxdbu.com
● Community: https://influxcommunity.slack.com & https://community.influxdata.com
❖ InfluxData 3 Core: https://github.com/influxdata/influxdb
❖ www.influxdata.com/cloud | via cloud marketplace (AWS, Azure, GCP)
Try for free!