Time series analysis and
prediction in the deep
learning era
Alberto Arrigoni, PhD
February 2019
Time series: analysis and prediction
What will the future hold?
FuturePast
Now
Time series applications + context
Time series prediction: e.g.
demand/sales forecasting...
Use prediction for anomaly
detection: e.g. manufacturing
settings...
Counterfactual prediction:
e.g. marketing campaigns...
Show ads
Counterfactual
Time series applications + context
Time series prediction: e.g.
demand/sales forecasting...
Use prediction for anomaly
detection: e.g. manufacturing
settings...
Counterfactual prediction:
e.g. marketing campaigns...
Show ads
Counterfactual
Time series prediction methods
(non-comprehensive list)
Classical autoregressive models Bayesian AR models
General machine learning
approaches
Deep learning
t+3
Number of time series (~ thousands)
[the SCALE problem]
Time series are often highly erratic,
intermittent or bursty (...and on highly
different scales)
~ 10 items
2 items
Product A Product B
...
(1)
(2)
Time series prediction and sales forecasting: issues
E.g. retail businesses
Time series belong to a hierarchy
of products/categories
E.g. online retailer selling clothes
Time series prediction and sales forecasting: issues
Now
Nike t-shirts
Clothes (total sales)
T-shirts total sales
~ 100
~ 1000(3)
For new products historical data is
missing (the cold-start problem)
(4)
Adidas t-shirts
Classical autoregressive models
Estimate model order (AIC, BIC)
Fit model parameters
(maximum likelihood)
Autoregressive component
Moving average component
Test residuals for
randomness
De-trending by differencing
Variance stabilization by log
or Box-Cox transformation
Workflow
Classical autoregressive models
THE PROS:
- Good explainability
- Solid theoretical background
- Very explicit model
- A lot of control as it is a manual process
THE CONS:
- Data is seldom stationary: trend,
seasonality, cycles need to modeled as
well
- Computationally intensive (one model for
each time series)
- No information sharing across time series
(apart from Hyndman’s hts approach) *
- Historical data are essential for
forecasting, (no cold-start)
* https://robjhyndman.com/publications/hierarchical/
Tech stack and packages
- Rob Hyndman’s online text:
https://otexts.com/fpp2/
- Infamous auto.arima
package, ets, tbats, garch,
stl...
- Python’s Pyramid
- Aggregate histograms over time scales
- Transform into Fourier space
- Add low/high pass filters as variables
General machine learning approach for ts prediction
Past Yt
t
Autoregressive component
- Can use any number of methods (linear, trees,
neural networks...)
- Turn the time series prediction problem into a
supervised learning problem
- Easily extendable to support multiple input
variables
- Covariates can be easily handled and
transformed through feature engineering
Covariates
E.g. feature engineering
THE PROS:
- Can model non-linear relationships
- Can model the “hierarchical structure” of the
time series through categorical variables
- Support for covariates (predictors) + feature
engineering
- One model is shared among multiple time
series
- Cold-start predictions are possible by
iteratively feeding the predictions back to the
feature space
THE CONS:
- Feature engineering takes time
- Long-term relationships between data points
need to be explicitly modeled
(autoregressive features)
General machine learning approach for ts prediction
Tech stack and packages
- Sklearn, PySpark for feature
engineering, data reduction
Bayesian AR models (Facebook Prophet)
Prophet is a Bayesian GAM (Generalized Additive Model)
Linear trend with
changepoints
Seasonal
component
Holiday-specific
componentt
Sales
1) Detect changepoints in the time
series
2) Fit linear trend parameters (k and
delta)
(piecewise) linear
trends
Growth rate Growth rate
adjustment
**
** An additional ‘offset’ term has been omitted from the formula
* Implemented using STAN
*
Bayesian AR models (Facebook Prophet)
E.g. P = 365 for yearly data
Need to estimate 2N parameters (an
and bn
) using MCMC!
Prophet is a Bayesian GAM (Generalized Additive Model)
Linear trend with
changepoints
Seasonal
component
Holiday-specific
componentt
Sales
THE PROS:
- Uncertainty estimation
- Bayesian changepoint detection
- User-in-the-loop paradigm (Prophet)
- Black-box variational inference is
revolutionizing Bayesian inference
THE CONS:
- Bayesian inference takes time (the “scale”
issue)
- One model for each time series
- No information sharing among series
(unless you specify a hierarchical bayesian
model with shared parameters, but still...)
- Historical data are needed for prediction!
- Performance is often on par* with
autoregressive models
Tech stack and packages
- Python/R clients for Prophet *
- R package for structural bayesian
time series models: Bsts
Bayesian AR models
* Taylor et al., Forecasting at scale* This may open endless discussions. Bottom line: depends on your data :)
Interlude: uncertainty estimation with deep learning
- Uncertainty estimation is a prerogative of Bayesian methods.
- Black box variational inference (ADVI) has sprung renewed interest towards Bayesian
neural networks, but we are not there yet in terms of performance
- A DeepMind paper from NIPS 2017 introduces a simple yet effective way to estimate
predictive uncertainty using Deep Ensembles
For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
“Engineering Uncertainty
Estimation in Neural Networks for
Time Series Prediction at Uber”
https://eng.uber.com/neural-network
s-uncertainty-estimation/
1) 2)
Interlude: Deep Ensembles
Train a deep learning model using a custom
final layer which parametrizes a Gaussian
distribution
Sample x from the Gaussian
distribution using fitted
parameters
Calculate loss to backpropagate the
error (using Gaussian likelihood)
(1)
(3)
(2)
Network output
What the network is learning: different
regions of the x space have different
variances
Generate a synthetic
dataset with different
variances
Interlude: Deep Ensembles
PREDICTION ON
TRAINING DATASET
SYNTHETIC TRAINING
DATASET
Use the network from previous
slide to predict on the training
set to see if it actually detects
variance reduction
Interlude: Deep Ensembles
The authors suggest to train different NNs on the
same data (the whole training set) with random
initialization
Ensemble networks (improve generalization power)
Uniformly weighted mixture model
Predictions for regions outside of
the training dataset show
increasing variance (due to
ensembling)
In addition to ‘distribution’ modeling
and ensembling the authors suggest to
use the fast gradient sign method * to
produce adversarial training example
(Not shown here)
* Goodfellow et al., 2014
Interlude: Deep Ensembles
Custom GaussianLayer
Let’s just do some extra work and define a
custom layer
For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
Interlude: Deep Ensembles
Custom layer returns both
mu and sigma
Build 2 weight matrices + 2
biase terms
DeepAR (Amazon)
Instead of fitting separate models for each time series we create a global model from related time
series to handle widely-varying scales through rescaling and velocity-based sampling.
Differentscales
Probabilities
~1000 time series
Past Future
Covariates
Flunkert et al., 2017
DeepAR (Amazon)
ht-1
ht
ht+1
- Use LSTM interactions in the time series
- As seen with the Deep Ensemble
architecture, we can predict parameters of
distributions at each time point (theta
vector)
- Time series need to be scaled for the
network to learn time-varying dynamics
DeepAR (Amazon)
* Likelihood/loss is customizable: Gaussian/negative binomial for count data + overdispersion
Training Prediction
*
For a commentary + code review: https://arrigonialberto86.github.io/funtime/deepar.html
DeepAR (Amazon)
The mandatory ‘AirPassengers’ prediction example (results shown on training set)
It is given that this is not the use case Amazon had in mind...
DeepAR (Amazon)
- Long-term relationships are handled by
design using LSTMs
- One model is fitted for all the time series
- The hierarchical ts structure and
inter-dependencies are captured by
using covariates (even holidays,
recurrent events etc...)
- The model can be used for cold-start
predictions (using categorical covariates
with ‘descriptive’ product information)
- Hassle-free uncertainty estimation
DeepAR and the AWS ecosystem
AWS SageMaker
Deep State Space (NIPS 2018)*
A state space model or SSM is just like an Hidden Markov Model, except the hidden states are
continuous
Observation (zt
)
update
Latent state (lt
)
update
In normal settings we would need to fit these parameters for each time series
zt-1 zt
zt+1
???
* Rangapuram et al, 2018, Deep State Space Models for Time Series Forecasting
Deep State Space (NIPS 2018)
Training
Prediction
Compute the negative
likelihood, derive the
time-varying SS
parameters using
backpropagation
Use Kalman filtering to
estimate lt
, then
recursively apply the
transition equation and the
observation model to
generate prediction
samples
- Long-term relationships are handled by
design using LSTMs
- One model is fitted for all the time
series
- The hierarchical ts structure and
inter-dependencies are captured by
ad-hoc design and components of the SS
model (even holidays, recurrent events
etc...)
- The model can be used for cold-start
predictions (using categorical covariates
with ‘descriptive’ product information)
Deep State Space (NIPS 2018)
Going forward: Deep factors with GPs *
* Maddix et al., “Deep Factors with Gaussian Processes for Forecasting”, NIPS 2018
The combination of probabilistic graphical models with deep neural networks has been an active
research area recently
Global DNN backbone and local Gaussian Process (GP). The main idea is to represent each
time series as a combination of a global time series and a corresponding local model.
gt
gt
gt
gt
RNN
zit
+ covariates Backpropagation to find RNN
parameters to produce global factors (gt
)
+ GP hyperparameters
M4 forecasting competition winner algo (Uber, 2018)
The winning idea is often the simplest!
Hybrid Exponential Smoothing-Recurrent Neural Networks (ES-RNN) method. It
mixes hand-coded parts like ES formulas with a black-box recurrent neural network
(RNN) forecasting engine.
yt-1
yt
yt+1
Deseasonalized and normalized vector of covariates + previous state
RNN results are now part of a parametric model
Classical
autoregressive
models
Bayesian models
(GAM/structural)
Classical
machine
learning
Deep learning
approaches
Scalability
Info sharing
across ts
Cold-start
predictions
Uncertainty
estimation
Unevenly spaced
time series *
Summary of performance
* DeepAR
Deep Factors
* Chen et al., Neural ordinary differential equations, 2018 / Futoma et al., 2017, Multitask GP + RNN
BACKUP SLIDES
Deep State Space (Amazon)
Level-trend model parametrization:
DeepAR (Amazon)
Step 1 Step 2 Step 3
Training procedure:
- Predict parameters (e.g. mu,
sigma)
- Compute likelihood of the
prediction (can be Gaussian as we
have seen with Deep Ensembles)
*
- Sample next point
* Likelihood/loss is customizable: Gaussian/negative
binomial for count data + overdispersion
Training
Prediction (~ Monte Carlo)

More Related Content

PPTX
Demand estimation and forecasting
PDF
LSTM Tutorial
PPTX
Time series forecasting with machine learning
PPTX
Transformers AI PPT.pptx
PPTX
Introduction to Transformer Model
PPTX
Introduction to Tableau
PDF
Time series forecasting
PDF
Machine Learning for Weather Forecasts
Demand estimation and forecasting
LSTM Tutorial
Time series forecasting with machine learning
Transformers AI PPT.pptx
Introduction to Transformer Model
Introduction to Tableau
Time series forecasting
Machine Learning for Weather Forecasts

What's hot (20)

PPTX
Machine learning & Time Series Analysis
PPTX
Deep ar presentation
PPTX
AWS Forcecast: DeepAR Predictor Time-series
PDF
Deep Learning for Time Series Data
PDF
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
PDF
Machine Learning Strategies for Time Series Prediction
PDF
Rnn and lstm
PPTX
Time Series Forecasting Project Presentation.
PPTX
Automated Machine Learning (Auto ML)
PDF
LSTM Basics
PDF
Time Series Classification with Deep Learning | Marco Del Pra
PDF
An Introduction to Neural Architecture Search
PDF
Deep Generative Models
PPTX
Attention Is All You Need
PPTX
Machine learning clustering
PDF
Self-supervised Learning Lecture Note
PDF
Markov Chain Monte Carlo Methods
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
PDF
Transformers in 2021
PDF
Long Short Term Memory
Machine learning & Time Series Analysis
Deep ar presentation
AWS Forcecast: DeepAR Predictor Time-series
Deep Learning for Time Series Data
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Machine Learning Strategies for Time Series Prediction
Rnn and lstm
Time Series Forecasting Project Presentation.
Automated Machine Learning (Auto ML)
LSTM Basics
Time Series Classification with Deep Learning | Marco Del Pra
An Introduction to Neural Architecture Search
Deep Generative Models
Attention Is All You Need
Machine learning clustering
Self-supervised Learning Lecture Note
Markov Chain Monte Carlo Methods
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Transformers in 2021
Long Short Term Memory
Ad

Similar to Time series deep learning (20)

PPTX
Gaussian Processes and Time Series.pptx
PPTX
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
PDF
Novel Ensemble Tree for Fast Prediction on Data Streams
PDF
Time Series Analysis Using an Event Streaming Platform
PDF
Time Series Analysis… using an Event Streaming Platform
PDF
Introduction to Machine Learning with SciKit-Learn
PPTX
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
PDF
AutoML lectures (ACDL 2019)
PPTX
Time Series Anomaly Detection with .net and Azure
PDF
2017 nov reflow sbtb
PDF
timeseries cheat sheet with example code for R
PDF
IEEE Datamining 2016 Title and Abstract
PPTX
House price prediction
PPTX
ML on Big Data: Real-Time Analysis on Time Series
PDF
Stock Market Prediction Using ANN
PDF
Tutorial on Deep Generative Models
PDF
Josh Patterson MLconf slides
DOCX
DLT UNIT-3.docx
PDF
Automating materials science workflows with pymatgen, FireWorks, and atomate
PDF
Spark Summit EU talk by Josef Habdank
Gaussian Processes and Time Series.pptx
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
Novel Ensemble Tree for Fast Prediction on Data Streams
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
Introduction to Machine Learning with SciKit-Learn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
AutoML lectures (ACDL 2019)
Time Series Anomaly Detection with .net and Azure
2017 nov reflow sbtb
timeseries cheat sheet with example code for R
IEEE Datamining 2016 Title and Abstract
House price prediction
ML on Big Data: Real-Time Analysis on Time Series
Stock Market Prediction Using ANN
Tutorial on Deep Generative Models
Josh Patterson MLconf slides
DLT UNIT-3.docx
Automating materials science workflows with pymatgen, FireWorks, and atomate
Spark Summit EU talk by Josef Habdank
Ad

Recently uploaded (20)

PPTX
recommendation Project PPT with details attached
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PDF
Grey Minimalist Professional Project Presentation (1).pdf
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPTX
ai agent creaction with langgraph_presentation_
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
Machine Learning and working of machine Learning
PPTX
PPT for Diseases.pptx, there are 3 types of diseases
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PPTX
MBA JAPAN: 2025 the University of Waseda
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
lung disease detection using transfer learning approach.pptx
PPT
Classification methods in data analytics.ppt
PPTX
ch20 Database System Architecture by Rizvee
PPTX
inbound2857676998455010149.pptxmmmmmmmmm
recommendation Project PPT with details attached
machinelearningoverview-250809184828-927201d2.pptx
Session 11 - Data Visualization Storytelling (2).pdf
Grey Minimalist Professional Project Presentation (1).pdf
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
ai agent creaction with langgraph_presentation_
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
Machine Learning and working of machine Learning
PPT for Diseases.pptx, there are 3 types of diseases
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
MBA JAPAN: 2025 the University of Waseda
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
lung disease detection using transfer learning approach.pptx
Classification methods in data analytics.ppt
ch20 Database System Architecture by Rizvee
inbound2857676998455010149.pptxmmmmmmmmm

Time series deep learning

  • 1. Time series analysis and prediction in the deep learning era Alberto Arrigoni, PhD February 2019
  • 2. Time series: analysis and prediction What will the future hold? FuturePast Now
  • 3. Time series applications + context Time series prediction: e.g. demand/sales forecasting... Use prediction for anomaly detection: e.g. manufacturing settings... Counterfactual prediction: e.g. marketing campaigns... Show ads Counterfactual
  • 4. Time series applications + context Time series prediction: e.g. demand/sales forecasting... Use prediction for anomaly detection: e.g. manufacturing settings... Counterfactual prediction: e.g. marketing campaigns... Show ads Counterfactual
  • 5. Time series prediction methods (non-comprehensive list) Classical autoregressive models Bayesian AR models General machine learning approaches Deep learning t+3
  • 6. Number of time series (~ thousands) [the SCALE problem] Time series are often highly erratic, intermittent or bursty (...and on highly different scales) ~ 10 items 2 items Product A Product B ... (1) (2) Time series prediction and sales forecasting: issues E.g. retail businesses
  • 7. Time series belong to a hierarchy of products/categories E.g. online retailer selling clothes Time series prediction and sales forecasting: issues Now Nike t-shirts Clothes (total sales) T-shirts total sales ~ 100 ~ 1000(3) For new products historical data is missing (the cold-start problem) (4) Adidas t-shirts
  • 8. Classical autoregressive models Estimate model order (AIC, BIC) Fit model parameters (maximum likelihood) Autoregressive component Moving average component Test residuals for randomness De-trending by differencing Variance stabilization by log or Box-Cox transformation Workflow
  • 9. Classical autoregressive models THE PROS: - Good explainability - Solid theoretical background - Very explicit model - A lot of control as it is a manual process THE CONS: - Data is seldom stationary: trend, seasonality, cycles need to modeled as well - Computationally intensive (one model for each time series) - No information sharing across time series (apart from Hyndman’s hts approach) * - Historical data are essential for forecasting, (no cold-start) * https://robjhyndman.com/publications/hierarchical/ Tech stack and packages - Rob Hyndman’s online text: https://otexts.com/fpp2/ - Infamous auto.arima package, ets, tbats, garch, stl... - Python’s Pyramid
  • 10. - Aggregate histograms over time scales - Transform into Fourier space - Add low/high pass filters as variables General machine learning approach for ts prediction Past Yt t Autoregressive component - Can use any number of methods (linear, trees, neural networks...) - Turn the time series prediction problem into a supervised learning problem - Easily extendable to support multiple input variables - Covariates can be easily handled and transformed through feature engineering Covariates E.g. feature engineering
  • 11. THE PROS: - Can model non-linear relationships - Can model the “hierarchical structure” of the time series through categorical variables - Support for covariates (predictors) + feature engineering - One model is shared among multiple time series - Cold-start predictions are possible by iteratively feeding the predictions back to the feature space THE CONS: - Feature engineering takes time - Long-term relationships between data points need to be explicitly modeled (autoregressive features) General machine learning approach for ts prediction Tech stack and packages - Sklearn, PySpark for feature engineering, data reduction
  • 12. Bayesian AR models (Facebook Prophet) Prophet is a Bayesian GAM (Generalized Additive Model) Linear trend with changepoints Seasonal component Holiday-specific componentt Sales 1) Detect changepoints in the time series 2) Fit linear trend parameters (k and delta) (piecewise) linear trends Growth rate Growth rate adjustment ** ** An additional ‘offset’ term has been omitted from the formula * Implemented using STAN *
  • 13. Bayesian AR models (Facebook Prophet) E.g. P = 365 for yearly data Need to estimate 2N parameters (an and bn ) using MCMC! Prophet is a Bayesian GAM (Generalized Additive Model) Linear trend with changepoints Seasonal component Holiday-specific componentt Sales
  • 14. THE PROS: - Uncertainty estimation - Bayesian changepoint detection - User-in-the-loop paradigm (Prophet) - Black-box variational inference is revolutionizing Bayesian inference THE CONS: - Bayesian inference takes time (the “scale” issue) - One model for each time series - No information sharing among series (unless you specify a hierarchical bayesian model with shared parameters, but still...) - Historical data are needed for prediction! - Performance is often on par* with autoregressive models Tech stack and packages - Python/R clients for Prophet * - R package for structural bayesian time series models: Bsts Bayesian AR models * Taylor et al., Forecasting at scale* This may open endless discussions. Bottom line: depends on your data :)
  • 15. Interlude: uncertainty estimation with deep learning - Uncertainty estimation is a prerogative of Bayesian methods. - Black box variational inference (ADVI) has sprung renewed interest towards Bayesian neural networks, but we are not there yet in terms of performance - A DeepMind paper from NIPS 2017 introduces a simple yet effective way to estimate predictive uncertainty using Deep Ensembles For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html “Engineering Uncertainty Estimation in Neural Networks for Time Series Prediction at Uber” https://eng.uber.com/neural-network s-uncertainty-estimation/ 1) 2)
  • 16. Interlude: Deep Ensembles Train a deep learning model using a custom final layer which parametrizes a Gaussian distribution Sample x from the Gaussian distribution using fitted parameters Calculate loss to backpropagate the error (using Gaussian likelihood) (1) (3) (2) Network output
  • 17. What the network is learning: different regions of the x space have different variances Generate a synthetic dataset with different variances Interlude: Deep Ensembles PREDICTION ON TRAINING DATASET SYNTHETIC TRAINING DATASET Use the network from previous slide to predict on the training set to see if it actually detects variance reduction
  • 18. Interlude: Deep Ensembles The authors suggest to train different NNs on the same data (the whole training set) with random initialization Ensemble networks (improve generalization power) Uniformly weighted mixture model Predictions for regions outside of the training dataset show increasing variance (due to ensembling) In addition to ‘distribution’ modeling and ensembling the authors suggest to use the fast gradient sign method * to produce adversarial training example (Not shown here) * Goodfellow et al., 2014
  • 19. Interlude: Deep Ensembles Custom GaussianLayer Let’s just do some extra work and define a custom layer For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
  • 20. Interlude: Deep Ensembles Custom layer returns both mu and sigma Build 2 weight matrices + 2 biase terms
  • 21. DeepAR (Amazon) Instead of fitting separate models for each time series we create a global model from related time series to handle widely-varying scales through rescaling and velocity-based sampling. Differentscales Probabilities ~1000 time series Past Future Covariates Flunkert et al., 2017
  • 22. DeepAR (Amazon) ht-1 ht ht+1 - Use LSTM interactions in the time series - As seen with the Deep Ensemble architecture, we can predict parameters of distributions at each time point (theta vector) - Time series need to be scaled for the network to learn time-varying dynamics
  • 23. DeepAR (Amazon) * Likelihood/loss is customizable: Gaussian/negative binomial for count data + overdispersion Training Prediction *
  • 24. For a commentary + code review: https://arrigonialberto86.github.io/funtime/deepar.html DeepAR (Amazon) The mandatory ‘AirPassengers’ prediction example (results shown on training set) It is given that this is not the use case Amazon had in mind...
  • 25. DeepAR (Amazon) - Long-term relationships are handled by design using LSTMs - One model is fitted for all the time series - The hierarchical ts structure and inter-dependencies are captured by using covariates (even holidays, recurrent events etc...) - The model can be used for cold-start predictions (using categorical covariates with ‘descriptive’ product information) - Hassle-free uncertainty estimation DeepAR and the AWS ecosystem AWS SageMaker
  • 26. Deep State Space (NIPS 2018)* A state space model or SSM is just like an Hidden Markov Model, except the hidden states are continuous Observation (zt ) update Latent state (lt ) update In normal settings we would need to fit these parameters for each time series zt-1 zt zt+1 ??? * Rangapuram et al, 2018, Deep State Space Models for Time Series Forecasting
  • 27. Deep State Space (NIPS 2018) Training Prediction Compute the negative likelihood, derive the time-varying SS parameters using backpropagation Use Kalman filtering to estimate lt , then recursively apply the transition equation and the observation model to generate prediction samples
  • 28. - Long-term relationships are handled by design using LSTMs - One model is fitted for all the time series - The hierarchical ts structure and inter-dependencies are captured by ad-hoc design and components of the SS model (even holidays, recurrent events etc...) - The model can be used for cold-start predictions (using categorical covariates with ‘descriptive’ product information) Deep State Space (NIPS 2018)
  • 29. Going forward: Deep factors with GPs * * Maddix et al., “Deep Factors with Gaussian Processes for Forecasting”, NIPS 2018 The combination of probabilistic graphical models with deep neural networks has been an active research area recently Global DNN backbone and local Gaussian Process (GP). The main idea is to represent each time series as a combination of a global time series and a corresponding local model. gt gt gt gt RNN zit + covariates Backpropagation to find RNN parameters to produce global factors (gt ) + GP hyperparameters
  • 30. M4 forecasting competition winner algo (Uber, 2018) The winning idea is often the simplest! Hybrid Exponential Smoothing-Recurrent Neural Networks (ES-RNN) method. It mixes hand-coded parts like ES formulas with a black-box recurrent neural network (RNN) forecasting engine. yt-1 yt yt+1 Deseasonalized and normalized vector of covariates + previous state RNN results are now part of a parametric model
  • 31. Classical autoregressive models Bayesian models (GAM/structural) Classical machine learning Deep learning approaches Scalability Info sharing across ts Cold-start predictions Uncertainty estimation Unevenly spaced time series * Summary of performance * DeepAR Deep Factors * Chen et al., Neural ordinary differential equations, 2018 / Futoma et al., 2017, Multitask GP + RNN
  • 33. Deep State Space (Amazon) Level-trend model parametrization:
  • 34. DeepAR (Amazon) Step 1 Step 2 Step 3 Training procedure: - Predict parameters (e.g. mu, sigma) - Compute likelihood of the prediction (can be Gaussian as we have seen with Deep Ensembles) * - Sample next point * Likelihood/loss is customizable: Gaussian/negative binomial for count data + overdispersion Training Prediction (~ Monte Carlo)