TimeSeries Machine Learning - PyData London 2025

Suyash Joshi
Sr. Developer Advocate, InfluxData
sjoshi@influxdata.com
PyData London 2025
Forecasting Weather using
TimeSeries Machine Learning

Agenda
● Time Series 101
● Machine/Deep Learning for Time Series Data
○ Anomaly Detection
○ Forecasting
● InfluxDB 3 Processing Engine (ML in the Database)
● Hand on Coding
● Q&A

Time Series Data is everywhere
Cloud, Application, Server Monitoring/Observability

Time Series Data is everywhere
Finance, Manufacturing, Healthcare
BioMetrics
(Sensor Data)

Time Series Data
Warehouse IoT
Devices
Ingest - A high amount of data streaming in at nano
second precision
Compression - The ability to store this large data set
without breaking the bank
Cardinality - The need to store wide rows,
timestamped data with multiple values
Querying by Time - Query and process rows by
time, values, and tags
Real time Analytics - Fast queries for real-time
analytics
Track + Monitor
system (model, app)
infrastructure
(DevOps, MLOps)
Metrics & Events
from several sources
(software,hardware,
cloud, network etc)
Robotics and Green
Energy

TIME SERIES
Evolution of Databases
DOCUMENT SEARCH
RELATIONAL
• Events, metrics, time-stamped
• For IoT, analytics, cloud native
• Distributed
search
• Logs
• Geo
• High
throughput
• Large
document
• Orders
• Customers
• Records
Time series is fastest growing
data category by far
All others
Time series
source: DB Engines
inﬂuxdb

😥 Packing Co is having recurring issues
with one of their packaging machines.
🤖 Unexpectedly, 1 of the machines will enter
a failing state which requires a manual
reset by an engineer.
📊 The Plant Manager has advised, when
running normally all machine sensors will
follow similar output patterns. If a
machine is at fault these will ﬂuctuate
abnormally.
🤔 How can we use help them ?
📦 Packing Co — Anomaly Detection

This could easily be
solved with
thresholding
In an ideal word …

What do we do when
our result becomes
unpredictable by
conventional means?
Realistically…

An embedded Python VM in InﬂuxDB 3
Core & Enterprise
The Processing Engine

Goal
Build and improve upon the functionality of:
Continuous Queries
Kapacitor
Flux Tasks
Telegraf
This functionality is brought directly inside the
database for eﬃciency and ease of use.
Built for easy development and access to
Python’s ecosystem of libraries and tools.
Great for data collection, transformation,
processing, monitoring, automation and more.
InﬂuxDB 3 Core | The Processing Engine
An Overview

Sends write data to a plugin once a second (can be conﬁgured)
Executes on a user-deﬁned schedule; great for data collection and deadman monitoring
Binds a plugin to an HTTP endpoint where request content is sent to the plugin, which
can then parse, process, and send the data into the database or to third party services
and respond to the caller
WAL Flush
Scheduled Task
On Request
Execution Types

Example Usage and Implementation
Adjusted Write Path for WAL Flush Trigger
Object
Storage
Every 10
Minutes
Every 1
Second
Incoming
Write
Write
Buffer
Queryable
Buffer
User creates a plugin, which is a Python
script stored in a defined location.
User adds and enables a trigger, which
decides when to call a specific plugin.
The trigger fires whenever the execution
method is needed (WAL flush, on request,
etc.)
1
2
3
The Approach
Processing Engine
WAL Flush Trigger
Plugin
#1
Plugin
#2
Plugin
#3

GET Request
On-Request Trigger
POST Request
Processing Engine
On-Request Trigger
Plugin
#1
Plugin
#2
Plugin
#3
Incoming
GET
Request
InﬂuxDB
Plugin API
Processing Engine
On-Request Trigger
Plugin
#1
Plugin
#2
Plugin
#3
Incoming
POST
Request
InﬂuxDB
Plugin API

Scheduled Trigger
Scheduled Trigger Usage
Object
Storage
Query
Write
Processing Engine
Scheduled Trigger
Plugin
#1
Plugin
#2
Plugin
#3
New Use Cases
Use the Scheduled plugin for
running speciﬁc processes on a
dedicated time cycle.
Great for data collection,
downsampling, status checks,
trend analyses, report generation,
and much more.
Every
10
Seconds
External
Service

Purpose
To enable persistent memory across different
calls to the Processing Engine.
Two Types
Trigger-Specific – Cache space tied to the
specific trigger within the Processing Engine.
Shared – Cache that is shared across all
triggers within the Processing Engine.
In-Memory Cache
Processing Engine
Shared Cache
WAL Flush
Trigger
On-Request
Trigger
Scheduled
Trigger
Individual Trigger Caches

Hands on Code
Plugin for Forecasting & Anomaly
detection
Material:
https://github.com/InfluxCommunity/WeatherForecasting

Analysing Time Series Data Engineering
1. Aggregates (Mean, Median)
● Rolling Averages: Smooth data to capture
long-term trends (e.g., 7-day average).
2. Handling Missing Values
● Imputation: Fill missing data using methods like
interpolation or forward/backward fill.
● Dropping: Remove rare missing data if it doesn’t
affect analysis.
3. Time Series Database Schema
● Timestamp as Primary Key: Use timestamps to
uniquely identify data points.
● Granularity: Choose the right time intervals (e.g.,
daily, hourly).
4. Cardinality
● High Cardinality: Many unique values (e.g.,
customer IDs).
● Low Cardinality: Few unique values (e.g., status
flags).

Forecasting Time Series Data
Statistical & Classical ML Methods
1. Naive Method
● Linear Regression: Predicts future values using a straight-line trend.
● Random Walk: Assumes future values change randomly, like stock prices.
2. Exponential Smoothing
● Gives more weight to recent data and less to older data for better predictions.
3. Lagging Features
● Uses past values to predict the future, capturing time-based patterns (e.g., sales or
temperature).
4. ARIMA
● AR (Autoregressive): Predicts future values based on past values.
● MA (Moving Average): Smooths past errors to improve predictions.
5. SARIMA
● Extends ARIMA by adding seasonal patterns to handle repeating trends over time.

reference: https://www.tecton.ai/blog/how-to-jump-from-batch-to-real-time-machine-learning

27
Additional TimeSeries ML Libraries
Unsupervised and rule-based
time series anomaly detection
The ADTK package allows you
to easily build an effective
detection model from a variety
or rule-based anomaly
detection methods.
“Prophet is a procedure for
forecasting time series data
based on an additive model
where non-linear trends are fit
with yearly, weekly, and daily
seasonality, plus holiday
effects.”
A Neural Network based
Time-Series model, inspired by
Facebook Prophet and AR-Net
(Autoregressive neural
network), built on PyTorch.
“A new powerful open source
library to perform time series
forecasting etc using LLM by
Amazon Research”
Anomaly Detection Toolkit
(ADTK)
FB Prophet Neural Prophet Chronos

Conclusion
Learning Resources
● Weather Forecasting Workshop: https://github.com/InfluxCommunity/WeatherForecasting
● Prophet Forecasting:
https://github.com/Anaisdg/influxdb3_plugins/tree/add-fbprophet-plugins/influxdata/Anaisdg/fbprophet
● River ML (Online timeseries machine learning library): https://github.com/online-ml/river
● Deep Learning Time Series Benchmark: https://huggingface.co/spaces/Salesforce/GIFT-Eval
● Book (Machine Learning Engineering): https://mlip-cmu.github.io/book/index.html
● Video tutorials on YouTube, InfluxDB University (free training): https://influxdbu.com
● Community: https://influxcommunity.slack.com & https://community.influxdata.com
❖ InfluxData 3 Core: https://github.com/influxdata/influxdb
❖ www.influxdata.com/cloud | via cloud marketplace (AWS, Azure, GCP)
Try for free!

TimeSeries Machine Learning - PyData London 2025

More Related Content

Similar to TimeSeries Machine Learning - PyData London 2025

More from Suyash Joshi

Recently uploaded

TimeSeries Machine Learning - PyData London 2025