Suyash Joshi
Sr. Developer Advocate, InfluxData
sjoshi@influxdata.com
PyData London 2025
Forecasting Weather using
TimeSeries Machine Learning
Agenda
● Time Series 101
● Machine/Deep Learning for Time Series Data
○ Anomaly Detection
○ Forecasting
● InfluxDB 3 Processing Engine (ML in the Database)
● Hand on Coding
● Q&A
Time Series Data is everywhere
Cloud, Application, Server Monitoring/Observability
Time Series Data is everywhere
Finance, Manufacturing, Healthcare
BioMetrics
(Sensor Data)
Time Series Data
Warehouse IoT
Devices
Ingest - A high amount of data streaming in at nano
second precision
Compression - The ability to store this large data set
without breaking the bank
Cardinality - The need to store wide rows,
timestamped data with multiple values
Querying by Time - Query and process rows by
time, values, and tags
Real time Analytics - Fast queries for real-time
analytics
Track + Monitor
system (model, app)
infrastructure
(DevOps, MLOps)
Metrics & Events
from several sources
(software,hardware,
cloud, network etc)
Robotics and Green
Energy
TIME SERIES
Evolution of Databases
DOCUMENT SEARCH
RELATIONAL
• Events, metrics, time-stamped
• For IoT, analytics, cloud native
• Distributed
search
• Logs
• Geo
• High
throughput
• Large
document
• Orders
• Customers
• Records
Time series is fastest growing
data category by far
All others
Time series
source: DB Engines
influxdb
InfluxDB 3 Products
InfluxData Platform
Metrics & Events
😥 Packing Co is having recurring issues
with one of their packaging machines.
🤖 Unexpectedly, 1 of the machines will enter
a failing state which requires a manual
reset by an engineer.
📊 The Plant Manager has advised, when
running normally all machine sensors will
follow similar output patterns. If a
machine is at fault these will fluctuate
abnormally.
🤔 How can we use help them ?
📦 Packing Co — Anomaly Detection
This could easily be
solved with
thresholding
In an ideal word …
What do we do when
our result becomes
unpredictable by
conventional means?
Realistically…
An embedded Python VM in InfluxDB 3
Core & Enterprise
The Processing Engine
Goal
Build and improve upon the functionality of:
Continuous Queries
Kapacitor
Flux Tasks
Telegraf
This functionality is brought directly inside the
database for efficiency and ease of use.
Built for easy development and access to
Python’s ecosystem of libraries and tools.
Great for data collection, transformation,
processing, monitoring, automation and more.
InfluxDB 3 Core | The Processing Engine
An Overview
Sends write data to a plugin once a second (can be configured)
Executes on a user-defined schedule; great for data collection and deadman monitoring
Binds a plugin to an HTTP endpoint where request content is sent to the plugin, which
can then parse, process, and send the data into the database or to third party services
and respond to the caller
WAL Flush
Scheduled Task
On Request
InfluxDB 3 Core | The Processing Engine
Execution Types
InfluxDB 3 Core | The Processing Engine
Example Usage and Implementation
Adjusted Write Path for WAL Flush Trigger
Object
Storage
Every 10
Minutes
Every 1
Second
Incoming
Write
Write
Buffer
Queryable
Buffer
User creates a plugin, which is a Python
script stored in a defined location.
User adds and enables a trigger, which
decides when to call a specific plugin.
The trigger fires whenever the execution
method is needed (WAL flush, on request,
etc.)
1
2
3
The Approach
Processing Engine
WAL Flush Trigger
Plugin
#1
Plugin
#2
Plugin
#3
GET Request
InfluxDB 3 Core | The Processing Engine
On-Request Trigger
POST Request
Processing Engine
On-Request Trigger
Plugin
#1
Plugin
#2
Plugin
#3
Incoming
GET
Request
InfluxDB
Plugin API
Processing Engine
On-Request Trigger
Plugin
#1
Plugin
#2
Plugin
#3
Incoming
POST
Request
InfluxDB
Plugin API
InfluxDB 3 Core | The Processing Engine
Scheduled Trigger
Scheduled Trigger Usage
Object
Storage
Query
Write
Processing Engine
Scheduled Trigger
Plugin
#1
Plugin
#2
Plugin
#3
New Use Cases
Use the Scheduled plugin for
running specific processes on a
dedicated time cycle.
Great for data collection,
downsampling, status checks,
trend analyses, report generation,
and much more.
Every
10
Seconds
External
Service
Purpose
To enable persistent memory across different
calls to the Processing Engine.
Two Types
Trigger-Specific – Cache space tied to the
specific trigger within the Processing Engine.
Shared – Cache that is shared across all
triggers within the Processing Engine.
InfluxDB 3 Core | The Processing Engine
In-Memory Cache
Processing Engine
Shared Cache
WAL Flush
Trigger
On-Request
Trigger
Scheduled
Trigger
Individual Trigger Caches
Hands on Code
Plugin for Forecasting & Anomaly
detection
Material:
https://github.com/InfluxCommunity/WeatherForecasting
Analysing Time Series Data Engineering
1. Aggregates (Mean, Median)
● Rolling Averages: Smooth data to capture
long-term trends (e.g., 7-day average).
2. Handling Missing Values
● Imputation: Fill missing data using methods like
interpolation or forward/backward fill.
● Dropping: Remove rare missing data if it doesn’t
affect analysis.
3. Time Series Database Schema
● Timestamp as Primary Key: Use timestamps to
uniquely identify data points.
● Granularity: Choose the right time intervals (e.g.,
daily, hourly).
4. Cardinality
● High Cardinality: Many unique values (e.g.,
customer IDs).
● Low Cardinality: Few unique values (e.g., status
flags).
Forecasting Time Series Data
Statistical & Classical ML Methods
1. Naive Method
● Linear Regression: Predicts future values using a straight-line trend.
● Random Walk: Assumes future values change randomly, like stock prices.
2. Exponential Smoothing
● Gives more weight to recent data and less to older data for better predictions.
3. Lagging Features
● Uses past values to predict the future, capturing time-based patterns (e.g., sales or
temperature).
4. ARIMA
● AR (Autoregressive): Predicts future values based on past values.
● MA (Moving Average): Smooths past errors to improve predictions.
5. SARIMA
● Extends ARIMA by adding seasonal patterns to handle repeating trends over time.
Forecasting Time Series Data
reference: https://www.tecton.ai/blog/how-to-jump-from-batch-to-real-time-machine-learning
27
Additional TimeSeries ML Libraries
Unsupervised and rule-based
time series anomaly detection
The ADTK package allows you
to easily build an effective
detection model from a variety
or rule-based anomaly
detection methods.
“Prophet is a procedure for
forecasting time series data
based on an additive model
where non-linear trends are fit
with yearly, weekly, and daily
seasonality, plus holiday
effects.”
A Neural Network based
Time-Series model, inspired by
Facebook Prophet and AR-Net
(Autoregressive neural
network), built on PyTorch.
“A new powerful open source
library to perform time series
forecasting etc using LLM by
Amazon Research”
Anomaly Detection Toolkit
(ADTK)
FB Prophet Neural Prophet Chronos
Conclusion
Learning Resources
● Weather Forecasting Workshop: https://github.com/InfluxCommunity/WeatherForecasting
● Prophet Forecasting:
https://github.com/Anaisdg/influxdb3_plugins/tree/add-fbprophet-plugins/influxdata/Anaisdg/fbprophet
● River ML (Online timeseries machine learning library): https://github.com/online-ml/river
● Deep Learning Time Series Benchmark: https://huggingface.co/spaces/Salesforce/GIFT-Eval
● Book (Machine Learning Engineering): https://mlip-cmu.github.io/book/index.html
● Video tutorials on YouTube, InfluxDB University (free training): https://influxdbu.com
● Community: https://influxcommunity.slack.com & https://community.influxdata.com
❖ InfluxData 3 Core: https://github.com/influxdata/influxdb
❖ www.influxdata.com/cloud | via cloud marketplace (AWS, Azure, GCP)
Try for free!
| © Copyright 2022, InfluxData
29
Thank you!

TimeSeries Machine Learning - PyData London 2025

  • 1.
    Suyash Joshi Sr. DeveloperAdvocate, InfluxData [email protected] PyData London 2025 Forecasting Weather using TimeSeries Machine Learning
  • 2.
    Agenda ● Time Series101 ● Machine/Deep Learning for Time Series Data ○ Anomaly Detection ○ Forecasting ● InfluxDB 3 Processing Engine (ML in the Database) ● Hand on Coding ● Q&A
  • 3.
    Time Series Datais everywhere Cloud, Application, Server Monitoring/Observability
  • 4.
    Time Series Datais everywhere Finance, Manufacturing, Healthcare BioMetrics (Sensor Data)
  • 5.
    Time Series Data WarehouseIoT Devices Ingest - A high amount of data streaming in at nano second precision Compression - The ability to store this large data set without breaking the bank Cardinality - The need to store wide rows, timestamped data with multiple values Querying by Time - Query and process rows by time, values, and tags Real time Analytics - Fast queries for real-time analytics Track + Monitor system (model, app) infrastructure (DevOps, MLOps) Metrics & Events from several sources (software,hardware, cloud, network etc) Robotics and Green Energy
  • 6.
    TIME SERIES Evolution ofDatabases DOCUMENT SEARCH RELATIONAL • Events, metrics, time-stamped • For IoT, analytics, cloud native • Distributed search • Logs • Geo • High throughput • Large document • Orders • Customers • Records Time series is fastest growing data category by far All others Time series source: DB Engines influxdb
  • 7.
  • 8.
  • 9.
  • 10.
    😥 Packing Cois having recurring issues with one of their packaging machines. 🤖 Unexpectedly, 1 of the machines will enter a failing state which requires a manual reset by an engineer. 📊 The Plant Manager has advised, when running normally all machine sensors will follow similar output patterns. If a machine is at fault these will fluctuate abnormally. 🤔 How can we use help them ? 📦 Packing Co — Anomaly Detection
  • 11.
    This could easilybe solved with thresholding In an ideal word …
  • 12.
    What do wedo when our result becomes unpredictable by conventional means? Realistically…
  • 14.
    An embedded PythonVM in InfluxDB 3 Core & Enterprise The Processing Engine
  • 15.
    Goal Build and improveupon the functionality of: Continuous Queries Kapacitor Flux Tasks Telegraf This functionality is brought directly inside the database for efficiency and ease of use. Built for easy development and access to Python’s ecosystem of libraries and tools. Great for data collection, transformation, processing, monitoring, automation and more. InfluxDB 3 Core | The Processing Engine An Overview
  • 16.
    Sends write datato a plugin once a second (can be configured) Executes on a user-defined schedule; great for data collection and deadman monitoring Binds a plugin to an HTTP endpoint where request content is sent to the plugin, which can then parse, process, and send the data into the database or to third party services and respond to the caller WAL Flush Scheduled Task On Request InfluxDB 3 Core | The Processing Engine Execution Types
  • 17.
    InfluxDB 3 Core| The Processing Engine Example Usage and Implementation Adjusted Write Path for WAL Flush Trigger Object Storage Every 10 Minutes Every 1 Second Incoming Write Write Buffer Queryable Buffer User creates a plugin, which is a Python script stored in a defined location. User adds and enables a trigger, which decides when to call a specific plugin. The trigger fires whenever the execution method is needed (WAL flush, on request, etc.) 1 2 3 The Approach Processing Engine WAL Flush Trigger Plugin #1 Plugin #2 Plugin #3
  • 18.
    GET Request InfluxDB 3Core | The Processing Engine On-Request Trigger POST Request Processing Engine On-Request Trigger Plugin #1 Plugin #2 Plugin #3 Incoming GET Request InfluxDB Plugin API Processing Engine On-Request Trigger Plugin #1 Plugin #2 Plugin #3 Incoming POST Request InfluxDB Plugin API
  • 19.
    InfluxDB 3 Core| The Processing Engine Scheduled Trigger Scheduled Trigger Usage Object Storage Query Write Processing Engine Scheduled Trigger Plugin #1 Plugin #2 Plugin #3 New Use Cases Use the Scheduled plugin for running specific processes on a dedicated time cycle. Great for data collection, downsampling, status checks, trend analyses, report generation, and much more. Every 10 Seconds External Service
  • 20.
    Purpose To enable persistentmemory across different calls to the Processing Engine. Two Types Trigger-Specific – Cache space tied to the specific trigger within the Processing Engine. Shared – Cache that is shared across all triggers within the Processing Engine. InfluxDB 3 Core | The Processing Engine In-Memory Cache Processing Engine Shared Cache WAL Flush Trigger On-Request Trigger Scheduled Trigger Individual Trigger Caches
  • 21.
    Hands on Code Pluginfor Forecasting & Anomaly detection Material: https://github.com/InfluxCommunity/WeatherForecasting
  • 22.
    Analysing Time SeriesData Engineering 1. Aggregates (Mean, Median) ● Rolling Averages: Smooth data to capture long-term trends (e.g., 7-day average). 2. Handling Missing Values ● Imputation: Fill missing data using methods like interpolation or forward/backward fill. ● Dropping: Remove rare missing data if it doesn’t affect analysis. 3. Time Series Database Schema ● Timestamp as Primary Key: Use timestamps to uniquely identify data points. ● Granularity: Choose the right time intervals (e.g., daily, hourly). 4. Cardinality ● High Cardinality: Many unique values (e.g., customer IDs). ● Low Cardinality: Few unique values (e.g., status flags).
  • 23.
    Forecasting Time SeriesData Statistical & Classical ML Methods 1. Naive Method ● Linear Regression: Predicts future values using a straight-line trend. ● Random Walk: Assumes future values change randomly, like stock prices. 2. Exponential Smoothing ● Gives more weight to recent data and less to older data for better predictions. 3. Lagging Features ● Uses past values to predict the future, capturing time-based patterns (e.g., sales or temperature). 4. ARIMA ● AR (Autoregressive): Predicts future values based on past values. ● MA (Moving Average): Smooths past errors to improve predictions. 5. SARIMA ● Extends ARIMA by adding seasonal patterns to handle repeating trends over time.
  • 24.
  • 25.
  • 27.
    27 Additional TimeSeries MLLibraries Unsupervised and rule-based time series anomaly detection The ADTK package allows you to easily build an effective detection model from a variety or rule-based anomaly detection methods. “Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.” A Neural Network based Time-Series model, inspired by Facebook Prophet and AR-Net (Autoregressive neural network), built on PyTorch. “A new powerful open source library to perform time series forecasting etc using LLM by Amazon Research” Anomaly Detection Toolkit (ADTK) FB Prophet Neural Prophet Chronos
  • 28.
    Conclusion Learning Resources ● WeatherForecasting Workshop: https://github.com/InfluxCommunity/WeatherForecasting ● Prophet Forecasting: https://github.com/Anaisdg/influxdb3_plugins/tree/add-fbprophet-plugins/influxdata/Anaisdg/fbprophet ● River ML (Online timeseries machine learning library): https://github.com/online-ml/river ● Deep Learning Time Series Benchmark: https://huggingface.co/spaces/Salesforce/GIFT-Eval ● Book (Machine Learning Engineering): https://mlip-cmu.github.io/book/index.html ● Video tutorials on YouTube, InfluxDB University (free training): https://influxdbu.com ● Community: https://influxcommunity.slack.com & https://community.influxdata.com ❖ InfluxData 3 Core: https://github.com/influxdata/influxdb ❖ www.influxdata.com/cloud | via cloud marketplace (AWS, Azure, GCP) Try for free!
  • 29.
    | © Copyright2022, InfluxData 29 Thank you!