SlideShare a Scribd company logo
Deep Learning for Time Series Data
ARUN KEJARIWAL
@arun_kejariwal
TheAIconf.com in San Francisco
September 2018
2
About Me
Product focus
Building and Scaling Teams
Advancing the
state-of-the-art
Scalability
Performance
3
Media
Fake News
Security
Threat Detection
Finance
Control Systems
Malfunction Detection
Operations
Availability
Performance
Forecasting
Applications
Time Series
4
Rule-based
μ ± 3*σ
Median, MAD
Tests
Generalized ESD
Underlying assumptions
Statistical
Forecasting
Seasonality
Trend
Techniques
ARIMA, SARIMA
Robust Kalman Filter
Time Series Analysis
Clustering
Other Techniques
PCA
OneSVM
Isolation Forests
(Un) Supervised
Autoencoder
Variational Autoencoder
LSTM
GRU
Clockwork RNN
Depth Gated RNN
Deep Learning
Anomaly Detection
Required in every application domain
5
HISTORY
Neural Network Research
6
Recurrent Neural Networks
Long History!
RTRL
TDNN
BPTT
NARX
[Robinson and Fallside 1987]
[Waibel 1987]
[Werbos 1988]
[Lin et al. 1996]
St: hidden state
“The LSTM’s main idea is that, instead of compuEng St from St-1
directly with a matrix-vector product followed by a nonlinearity,
the LSTM directly computes St, which is then added to St-1 to
obtain St.” [Jozefowicz et al. 2015]
7
RNN: Long Short-Term Memory
Over 20 yrs!
Neural Computation, 1997
*
* Figure borrowed from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Resistant to vanishing gradient problem
Achieve better results when dropout is used
Adding bias of 1 to LSTM’s forget gate(a) Forget gate (b) Input gate
(c) Output gate
8
RNN: Long Short-Term Memory
Application to Anomaly Detection
*Prediction Error
Finding pattern anomalies
No need for a fixed size window for model estimation
Works with non-stationary time series with irregular
structure
LSTM Encoder-Decoder
Explore other extensions of LSTMs such as
GRU, Memory Networks, Convolutional LSTM,
Quasi-RNN, Dilated RNN, Skip RNN, HF-RNN,
Bi-RNN, Zoneout (regularizing RNN), TAGM
9
Forecasting
Financial
Translation
Machine
Synthesis
Speech
Modeling
NLP
Sequence Modeling
!
!
!
!
[1] “A Critical Review of Recurrent Neural Networks for Sequence Learning”, by Lipton et al., 2015. (https://arxiv.org/abs/1506.00019)
[2] “Sequence to sequence learning with Neural Networks”, by Sutskever et al., 2014. (https://arxiv.org/abs/1409.3215)
10
Alternatives
To RNNs
TCN*	
Temporal Convolutional Network
=
1D Fully-Convolueonal Network
+
Causal convolueons
Assumptions and Optimizations	
“Stability”
Dilation
Residual Connections
Advantages	
Inference Speed
Parallelizability
Trainability
Feed Forward Models#	
Gated-Convolutional Language Model
Universal Transformer
WaveNet
ù
ù


* "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling", by Bai et al. 2018. (https://arxiv.org/pdf/1803.01271.pdf)
# “When Recurrent Models Don't Need to be Recurrent”, by Miller and Hardt, 2018. (https://arxiv.org/pdf/1805.10369.pdf)
“… the “infinite memory” advantage of RNNs is largely absent in practice.”
“The preeminence enjoyed by recurrent networks in sequence modeling may be largely a vestige of history.”
[Bai et al. 2018]
{
11
Challenges
Going Beyond
Large Number of
Time Series
Up to Hundreds of
Millions
Dashboarding
Impractical
Sea of
Anomalies
Fatigue
Root Cause
Analysis
Long TTD
12
Multi-variate Analysis
Curse of Dimensionality
Computationally Expensive
Real-time constraints
Dimensionality Reduction
PCA, SVD
Recency
Decision making
13
14
Correlation Analysis
An Example
15
Correlation Analysis
Another Example
16
Correlation Analysis
Surfacing Actionable Insights
17
Correlation Analysis
Surfacing Actionable Insights
Correlation matrix
Bird’s eye view
Not Scalable (O(n2))
Millions of time series/data streams
Meaningless correlations
Lack of context
Systems domain
Exploiting topology
18
Pearson	
[Pearson 1900]
Goodman and Kruskal 𝛾
[Goodman and Kruskal ’54]
Kandal 𝜏	
[Kendall ‘38]
Spearman 𝜌	
[Spearman 1904, 1906]
Somer’s D	
[Somer ’62]
Cohen’s 𝜅
[Cohen ‘60]
Cramer’s V
[Cramer '46]
Correlation Coefficients
Different Flavors
19
Pearson Correlation
A Deep Dive
Robustness
Sensitive to outliers
Amenable to incremental computation
Linear correlation
Susceptible to
Curvature
Magnitude of the residuals
Not rotation invariant
Susceptible to Heteroscedasticity
Trade-off
Speed
Accuracy
* Figure borrowed from “Robust Correlation: Theory and Applications”,by Shevlyakov and Oja.
20
Modalities*
Time Series
T I
A V
Text
Sentiment Analysis
Image
Animated Gifs
Audio
Digital Assistants
Video
Live Streams
H Haptic
Robotics
“Multimodal Machine Learning: A Survey and Taxonomy”,by Baltrušaitis et al., 2017.
21
Other Modalities
Research Opportunity
Smell Taste
22
Feature Extraction
[Fu et al. 2008]
Correlation
Embedding Analysis
Feature Extraction
[Fu et al. 2008]
Correlational PCA
Common Representation Learning
[Chandar et al. 2017]
Correlational NN
Face Detection
[Deng et al. 2017]
Correlation Loss
Object Tracking
[Valmadre et al. 2017]
CFNet
LEVERAGING CORRELATION
Deep Learning Context
23
Loss Functions
Different Flavors
Class separability of features (minimize interclass correlation)
Softmax Loss
Improved Triplet Loss [Cheng et al. 2016]
Triplet Loss [Schroff et al. 2015]
Center Invariant Loss [Wu et al. 2017]
Center Loss [Wen et al. 2016]
Larger inter-class variaeon and a smaller intra-class variaeon
Quadruplet Loss [Chen et al. 2017]
Separability and discriminatory ability of features
(maximize intraclass correlation)
Correlation Loss [Deng et al. 2017]
Correlation Loss
Deep Dive
Normalization
Non-linearly changes the distribution
Yields non-Gaussian distribution
Uncentered Pearson Correlation
Angle similarity
Insensitive to anomalies
Enlarge margins amongst different classes
Softmaxloss
Correlationloss
Distribution of Deeply Learned Features*
* Figure borrowed from [Deng et al. 2017].
25
CANONICAL CORRELATION ANALYSIS
Common Representation Learning
Deep Generalized
[Benton et al. 2017]
Deep Discriminative
[Dorfer and Widmer 2016]
Deep Variational
[Wang et al. 2016]
Soft Decorrelation
[Chang et al. 2017]
Maximize correlation of the views when projected to a common subspace
Minimize self and cross-reconstruction error and maximize correlation
Leverage CRL for transfer learning - high commercial potential
26
27
Spurious Correlation
Long History
28
Spurious Correlation
Lack Of Context
* http://www.tylervigen.com/spurious-correlations
*
# * http://www.tylervigen.com/spurious-correlations
#
29
Nonsense Correlation
Long History
Deep Learning for Time Series Data
Deep Learning for Time Series Data
32
Parallel and
Distributed Processing
Eds. Rumelhart and
McClelland, ‘86
Neurocomputing:
Foundations of Research
Anderson and
Rosenfeld 1988
The Roots of
Backpropagation
Werbos 1994
Neural Networks: A
Systematic Introduction
Rojas 1996
READINGSSurveys & Books
"
33
READINGSSurveys & Books
Deep Learning
LeCun et al. 2015
Deep Learning in Neural Networks: An overview
Schmidhuber 2015
Deep Learning
Goodfellow et al. 2016
Neuro-dynamic programming
Bertsekas 1996
34
[Werbos ’74]
Beyond Regression: New Tools for Prediction
and Analysis in the Behavioral Sciences
[Parker ’82]
Learning Logic
[Rumelhart, Hinton and Williams ‘86]
Learning internal representations by error
propagation
[Lippmann ’87]
An introduction to computing with neural
networks
35
[Widrow and Lehr ’90]
30 Years of Adaptive Neural Networks:
Perceptron, Madaline, and Backpropagation
[Wang and Raj ’17]
On the Origin of Deep Learning
[Arulkumaran et al. ’17]
A Brief Survey of Deep Reinforcement Learning
[Alom et al. ’18]
The History Began from AlexNet: A
Comprehensive Survey on Deep
Learning Approaches
36
[Higham and Higham ’18]
Deep Learning: An Introduceon
for Applied Mathemaecians
[Marcus ’18]
Deep Learning: A Criecal Appraisal
37
READINGS
Anomaly Detection
Understanding anomaly detection
safaribooksonline.com/library/view/understanding-anomaly-deteceon/9781491983676/
https://www.slideshare.net/arunkejariwal/anomaly-detection-in-realtime-data-streams-using-heron
“Variational Inference For On-Line Anomaly Detection In High-Dimensional Time Series”, by Sölch et al., 2016.
https://arxiv.org/pdf/1602.07109.pdf
https://www.slideshare.net/arunkejariwal/live-anomaly-detection-80287265
“Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction”, by Sakurada and Yairi 2014.
https://dl.acm.org/citation.cfm?id=2689747
“On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data”, by Choudhary et al., 2017.
https://arxiv.org/abs/1710.04735
38
READINGS
RNNs
“Learning to Forget: Continual Prediction with LSTM”, by Gers et al., 2000.
“Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”, by Chung et al., 2014.
“An Empirical Exploration Of Recurrent Network Architectures”, by Jozefowicz et al., 2015.
“On the Properties of Neural Machine Translation: Encoder–Decoder Approaches”, by Cho et al., 2014.
“Visualizing and Understanding Recurrent Networks”, by Karpathy et al., 2015.
“LSTM: A Search Space Odyssey”, by Greff et al., 2017.
39
READINGS
Deep Learning based Multi-View Learning
“Deep Multimodal Autoencoders”, by Ngiam et al., 2011.
“Extending Long Short-Term Memory for Multi-View Structured Learning”, by Rajagopalan et al., 2016.
“Compressing Recurrent Neural Network With Tensor Train”, by Tjandra et al., 2017.
“Deep Canonically Correlated Autoencoders”, by Wang et al., 2015.
“Multimodal Tensor Fusion Network”, by Zadeh et al., 2017.
“Memory Fusion Network for Multi-View Sequential Learning”, by Zadeh et al., 2018.
40
RESOURCESHistory of Neural Networks
http://www.andreykurenkov.com/writing/ai/a-brief-history-of-neural-nets-and-deep-learning/
http://people.idsia.ch/~juergen/firstdeeplearner.html
https://www.import.io/post/history-of-deep-learning/
https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html
http://www.psych.utoronto.ca/users/reingold/courses/ai/cache/neural4.html
https://devblogs.nvidia.com/deep-learning-nutshell-history-training/
41
RESOURCESTransfer Learning
“Learning To Learn”, by Thrun and Pratt (Eds), 1998.
https://www.springer.com/us/book/9780792380474
“Transfer Learning”, by Torrey and Shavlik, 2009.
http://ftp.cs.wisc.edu/machine-learning/shavlik-group/torrey.handbook09.pdf
http://ruder.io/transfer-learning/
“A Survey on Transfer Learning”, by Pan and Yang, 2009.
https://www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf
http://people.idsia.ch/~juergen/metalearner.html
“Learning to Remember Rare Events”, by Kaiser et al. 2017.
https://arxiv.org/abs/1703.03129
42
RESOURCESPotpourri
https://medium.com/data-science-group-iitr/loss-functions-and-optimization-algorithms-demystified-bb92daff331c
"Are Loss Functions All the Same?", by Rosasco et al. , 2004.
https://www.mitpressjournals.org/doi/10.1162/089976604773135104
“Some Thoughts About The Design Of Loss Functions”, by Hennig and Kutlukaya, 2007.
https://www.ine.pt/revstat/pdf/rs070102.pdf
http://christopher5106.github.io/deep/learning/2016/09/16/about-loss-functions-multinomial-logistic-logarithm-cross-entropy-square-errors-euclidian-absolute-frobenius-hinge.html
“On Loss Functions for Deep Neural Networks in Classification”, by Janocha and Czarnecki, 2017.
https://arxiv.org/pdf/1702.05659.pdf
“A More General Robust Loss Function”, by Barron, 2018.
https://arxiv.org/pdf/1701.03077.pdf

More Related Content

What's hot (20)

PDF
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
PDF
Lecture 1: What is Machine Learning?
Marina Santini
 
PDF
Recurrent Neural Networks
Sharath TS
 
PDF
Autoencoder
HARISH R
 
PDF
Introduction to Deep learning
Massimiliano Ruocco
 
PDF
LSTM Basics
Akshay Sehgal
 
PDF
Generative adversarial networks
남주 김
 
PDF
GAN - Theory and Applications
Emanuele Ghelfi
 
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
PDF
Introduction to Recurrent Neural Network
Knoldus Inc.
 
PDF
Domain Transfer and Adaptation Survey
Sangwoo Mo
 
PPTX
Machine Learning and Real-World Applications
MachinePulse
 
PDF
Machine Learning Strategies for Time Series Prediction
Gianluca Bontempi
 
PDF
Self-Attention with Linear Complexity
Sangwoo Mo
 
PDF
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
PDF
Distributed machine learning
Stanley Wang
 
PPTX
Federated learning in brief
Shashi Perera
 
PDF
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
PPTX
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
PDF
Training Neural Networks
Databricks
 
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Lecture 1: What is Machine Learning?
Marina Santini
 
Recurrent Neural Networks
Sharath TS
 
Autoencoder
HARISH R
 
Introduction to Deep learning
Massimiliano Ruocco
 
LSTM Basics
Akshay Sehgal
 
Generative adversarial networks
남주 김
 
GAN - Theory and Applications
Emanuele Ghelfi
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Introduction to Recurrent Neural Network
Knoldus Inc.
 
Domain Transfer and Adaptation Survey
Sangwoo Mo
 
Machine Learning and Real-World Applications
MachinePulse
 
Machine Learning Strategies for Time Series Prediction
Gianluca Bontempi
 
Self-Attention with Linear Complexity
Sangwoo Mo
 
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
Distributed machine learning
Stanley Wang
 
Federated learning in brief
Shashi Perera
 
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Training Neural Networks
Databricks
 

Similar to Deep Learning for Time Series Data (20)

PDF
Published irisrecognitionpaper
Dr-mahmoud Algamel
 
PDF
Published irisrecognitionpaper
Dr-mahmoud Algamel
 
PPTX
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
민진 최
 
PPTX
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
Sujeet Suryawanshi
 
PDF
Cloud-based Data Stream Processing
Zbigniew Jerzak
 
DOCX
Humanmetrics Jung Typology Test™You haven’t answered 1 que
NarcisaBrandenburg70
 
PDF
factorization methods
Shaina Raza
 
PPTX
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
GagandeepKaur872517
 
PDF
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
Nexgen Technology
 
PDF
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
ijtsrd
 
PDF
Broadcasting Scenario under Different Protocols in MANET: A Survey
rahulmonikasharma
 
PPTX
Master's Thesis Presentation
Wajdi Khattel
 
PDF
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal
 
PDF
Challenges in Analytics for BIG Data
Prasant Misra
 
PDF
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
IJERA Editor
 
PPTX
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Niki Pavlopoulou
 
PPTX
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
Harshal Solao
 
PDF
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
theijes
 
PDF
Android malware detection through online learning
IJARIIT
 
Published irisrecognitionpaper
Dr-mahmoud Algamel
 
Published irisrecognitionpaper
Dr-mahmoud Algamel
 
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
민진 최
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
Sujeet Suryawanshi
 
Cloud-based Data Stream Processing
Zbigniew Jerzak
 
Humanmetrics Jung Typology Test™You haven’t answered 1 que
NarcisaBrandenburg70
 
factorization methods
Shaina Raza
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
GagandeepKaur872517
 
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
Nexgen Technology
 
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
ijtsrd
 
Broadcasting Scenario under Different Protocols in MANET: A Survey
rahulmonikasharma
 
Master's Thesis Presentation
Wajdi Khattel
 
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal
 
Challenges in Analytics for BIG Data
Prasant Misra
 
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
IJERA Editor
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Niki Pavlopoulou
 
An Efficient Parallel Algorithm for Secured Data Communication Using RSA Publ...
Harshal Solao
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
theijes
 
Android malware detection through online learning
IJARIIT
 
Ad

More from Arun Kejariwal (20)

PDF
Anomaly Detection At The Edge
Arun Kejariwal
 
PDF
Serverless Streaming Architectures and Algorithms for the Enterprise
Arun Kejariwal
 
PDF
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal
 
PDF
Model Serving via Pulsar Functions
Arun Kejariwal
 
PDF
Designing Modern Streaming Data Applications
Arun Kejariwal
 
PDF
Correlation Analysis on Live Data Streams
Arun Kejariwal
 
PDF
Correlation Analysis on Live Data Streams
Arun Kejariwal
 
PDF
Live Anomaly Detection
Arun Kejariwal
 
PDF
Modern real-time streaming architectures
Arun Kejariwal
 
PDF
Anomaly detection in real-time data streams using Heron
Arun Kejariwal
 
PDF
Data Data Everywhere: Not An Insight to Take Action Upon
Arun Kejariwal
 
PDF
Real Time Analytics: Algorithms and Systems
Arun Kejariwal
 
PDF
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 
PDF
Velocity 2015-final
Arun Kejariwal
 
PDF
Statistical Learning Based Anomaly Detection @ Twitter
Arun Kejariwal
 
PDF
Days In Green (DIG): Forecasting the life of a healthy service
Arun Kejariwal
 
PDF
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Arun Kejariwal
 
PDF
A Systematic Approach to Capacity Planning in the Real World
Arun Kejariwal
 
PDF
Isolating Events from the Fail Whale
Arun Kejariwal
 
PDF
Techniques for Minimizing Cloud Footprint
Arun Kejariwal
 
Anomaly Detection At The Edge
Arun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Arun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Arun Kejariwal
 
Model Serving via Pulsar Functions
Arun Kejariwal
 
Designing Modern Streaming Data Applications
Arun Kejariwal
 
Correlation Analysis on Live Data Streams
Arun Kejariwal
 
Correlation Analysis on Live Data Streams
Arun Kejariwal
 
Live Anomaly Detection
Arun Kejariwal
 
Modern real-time streaming architectures
Arun Kejariwal
 
Anomaly detection in real-time data streams using Heron
Arun Kejariwal
 
Data Data Everywhere: Not An Insight to Take Action Upon
Arun Kejariwal
 
Real Time Analytics: Algorithms and Systems
Arun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 
Velocity 2015-final
Arun Kejariwal
 
Statistical Learning Based Anomaly Detection @ Twitter
Arun Kejariwal
 
Days In Green (DIG): Forecasting the life of a healthy service
Arun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Arun Kejariwal
 
A Systematic Approach to Capacity Planning in the Real World
Arun Kejariwal
 
Isolating Events from the Fail Whale
Arun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Arun Kejariwal
 
Ad

Recently uploaded (20)

PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
July Patch Tuesday
Ivanti
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 

Deep Learning for Time Series Data

  • 1. Deep Learning for Time Series Data ARUN KEJARIWAL @arun_kejariwal TheAIconf.com in San Francisco September 2018
  • 2. 2 About Me Product focus Building and Scaling Teams Advancing the state-of-the-art Scalability Performance
  • 3. 3 Media Fake News Security Threat Detection Finance Control Systems Malfunction Detection Operations Availability Performance Forecasting Applications Time Series
  • 4. 4 Rule-based μ ± 3*σ Median, MAD Tests Generalized ESD Underlying assumptions Statistical Forecasting Seasonality Trend Techniques ARIMA, SARIMA Robust Kalman Filter Time Series Analysis Clustering Other Techniques PCA OneSVM Isolation Forests (Un) Supervised Autoencoder Variational Autoencoder LSTM GRU Clockwork RNN Depth Gated RNN Deep Learning Anomaly Detection Required in every application domain
  • 6. 6 Recurrent Neural Networks Long History! RTRL TDNN BPTT NARX [Robinson and Fallside 1987] [Waibel 1987] [Werbos 1988] [Lin et al. 1996]
  • 7. St: hidden state “The LSTM’s main idea is that, instead of compuEng St from St-1 directly with a matrix-vector product followed by a nonlinearity, the LSTM directly computes St, which is then added to St-1 to obtain St.” [Jozefowicz et al. 2015] 7 RNN: Long Short-Term Memory Over 20 yrs! Neural Computation, 1997 * * Figure borrowed from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Resistant to vanishing gradient problem Achieve better results when dropout is used Adding bias of 1 to LSTM’s forget gate(a) Forget gate (b) Input gate (c) Output gate
  • 8. 8 RNN: Long Short-Term Memory Application to Anomaly Detection *Prediction Error Finding pattern anomalies No need for a fixed size window for model estimation Works with non-stationary time series with irregular structure LSTM Encoder-Decoder Explore other extensions of LSTMs such as GRU, Memory Networks, Convolutional LSTM, Quasi-RNN, Dilated RNN, Skip RNN, HF-RNN, Bi-RNN, Zoneout (regularizing RNN), TAGM
  • 9. 9 Forecasting Financial Translation Machine Synthesis Speech Modeling NLP Sequence Modeling ! ! ! ! [1] “A Critical Review of Recurrent Neural Networks for Sequence Learning”, by Lipton et al., 2015. (https://arxiv.org/abs/1506.00019) [2] “Sequence to sequence learning with Neural Networks”, by Sutskever et al., 2014. (https://arxiv.org/abs/1409.3215)
  • 10. 10 Alternatives To RNNs TCN* Temporal Convolutional Network = 1D Fully-Convolueonal Network + Causal convolueons Assumptions and Optimizations “Stability” Dilation Residual Connections Advantages Inference Speed Parallelizability Trainability Feed Forward Models# Gated-Convolutional Language Model Universal Transformer WaveNet ù ù * "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling", by Bai et al. 2018. (https://arxiv.org/pdf/1803.01271.pdf) # “When Recurrent Models Don't Need to be Recurrent”, by Miller and Hardt, 2018. (https://arxiv.org/pdf/1805.10369.pdf) “… the “infinite memory” advantage of RNNs is largely absent in practice.” “The preeminence enjoyed by recurrent networks in sequence modeling may be largely a vestige of history.” [Bai et al. 2018] {
  • 11. 11 Challenges Going Beyond Large Number of Time Series Up to Hundreds of Millions Dashboarding Impractical Sea of Anomalies Fatigue Root Cause Analysis Long TTD
  • 12. 12 Multi-variate Analysis Curse of Dimensionality Computationally Expensive Real-time constraints Dimensionality Reduction PCA, SVD Recency Decision making
  • 13. 13
  • 17. 17 Correlation Analysis Surfacing Actionable Insights Correlation matrix Bird’s eye view Not Scalable (O(n2)) Millions of time series/data streams Meaningless correlations Lack of context Systems domain Exploiting topology
  • 18. 18 Pearson [Pearson 1900] Goodman and Kruskal 𝛾 [Goodman and Kruskal ’54] Kandal 𝜏 [Kendall ‘38] Spearman 𝜌 [Spearman 1904, 1906] Somer’s D [Somer ’62] Cohen’s 𝜅 [Cohen ‘60] Cramer’s V [Cramer '46] Correlation Coefficients Different Flavors
  • 19. 19 Pearson Correlation A Deep Dive Robustness Sensitive to outliers Amenable to incremental computation Linear correlation Susceptible to Curvature Magnitude of the residuals Not rotation invariant Susceptible to Heteroscedasticity Trade-off Speed Accuracy * Figure borrowed from “Robust Correlation: Theory and Applications”,by Shevlyakov and Oja.
  • 20. 20 Modalities* Time Series T I A V Text Sentiment Analysis Image Animated Gifs Audio Digital Assistants Video Live Streams H Haptic Robotics “Multimodal Machine Learning: A Survey and Taxonomy”,by Baltrušaitis et al., 2017.
  • 22. 22 Feature Extraction [Fu et al. 2008] Correlation Embedding Analysis Feature Extraction [Fu et al. 2008] Correlational PCA Common Representation Learning [Chandar et al. 2017] Correlational NN Face Detection [Deng et al. 2017] Correlation Loss Object Tracking [Valmadre et al. 2017] CFNet LEVERAGING CORRELATION Deep Learning Context
  • 23. 23 Loss Functions Different Flavors Class separability of features (minimize interclass correlation) Softmax Loss Improved Triplet Loss [Cheng et al. 2016] Triplet Loss [Schroff et al. 2015] Center Invariant Loss [Wu et al. 2017] Center Loss [Wen et al. 2016] Larger inter-class variaeon and a smaller intra-class variaeon Quadruplet Loss [Chen et al. 2017] Separability and discriminatory ability of features (maximize intraclass correlation) Correlation Loss [Deng et al. 2017]
  • 24. Correlation Loss Deep Dive Normalization Non-linearly changes the distribution Yields non-Gaussian distribution Uncentered Pearson Correlation Angle similarity Insensitive to anomalies Enlarge margins amongst different classes Softmaxloss Correlationloss Distribution of Deeply Learned Features* * Figure borrowed from [Deng et al. 2017].
  • 25. 25 CANONICAL CORRELATION ANALYSIS Common Representation Learning Deep Generalized [Benton et al. 2017] Deep Discriminative [Dorfer and Widmer 2016] Deep Variational [Wang et al. 2016] Soft Decorrelation [Chang et al. 2017] Maximize correlation of the views when projected to a common subspace Minimize self and cross-reconstruction error and maximize correlation Leverage CRL for transfer learning - high commercial potential
  • 26. 26
  • 28. 28 Spurious Correlation Lack Of Context * http://www.tylervigen.com/spurious-correlations * # * http://www.tylervigen.com/spurious-correlations #
  • 32. 32 Parallel and Distributed Processing Eds. Rumelhart and McClelland, ‘86 Neurocomputing: Foundations of Research Anderson and Rosenfeld 1988 The Roots of Backpropagation Werbos 1994 Neural Networks: A Systematic Introduction Rojas 1996 READINGSSurveys & Books "
  • 33. 33 READINGSSurveys & Books Deep Learning LeCun et al. 2015 Deep Learning in Neural Networks: An overview Schmidhuber 2015 Deep Learning Goodfellow et al. 2016 Neuro-dynamic programming Bertsekas 1996
  • 34. 34 [Werbos ’74] Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences [Parker ’82] Learning Logic [Rumelhart, Hinton and Williams ‘86] Learning internal representations by error propagation [Lippmann ’87] An introduction to computing with neural networks
  • 35. 35 [Widrow and Lehr ’90] 30 Years of Adaptive Neural Networks: Perceptron, Madaline, and Backpropagation [Wang and Raj ’17] On the Origin of Deep Learning [Arulkumaran et al. ’17] A Brief Survey of Deep Reinforcement Learning [Alom et al. ’18] The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches
  • 36. 36 [Higham and Higham ’18] Deep Learning: An Introduceon for Applied Mathemaecians [Marcus ’18] Deep Learning: A Criecal Appraisal
  • 37. 37 READINGS Anomaly Detection Understanding anomaly detection safaribooksonline.com/library/view/understanding-anomaly-deteceon/9781491983676/ https://www.slideshare.net/arunkejariwal/anomaly-detection-in-realtime-data-streams-using-heron “Variational Inference For On-Line Anomaly Detection In High-Dimensional Time Series”, by Sölch et al., 2016. https://arxiv.org/pdf/1602.07109.pdf https://www.slideshare.net/arunkejariwal/live-anomaly-detection-80287265 “Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction”, by Sakurada and Yairi 2014. https://dl.acm.org/citation.cfm?id=2689747 “On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data”, by Choudhary et al., 2017. https://arxiv.org/abs/1710.04735
  • 38. 38 READINGS RNNs “Learning to Forget: Continual Prediction with LSTM”, by Gers et al., 2000. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”, by Chung et al., 2014. “An Empirical Exploration Of Recurrent Network Architectures”, by Jozefowicz et al., 2015. “On the Properties of Neural Machine Translation: Encoder–Decoder Approaches”, by Cho et al., 2014. “Visualizing and Understanding Recurrent Networks”, by Karpathy et al., 2015. “LSTM: A Search Space Odyssey”, by Greff et al., 2017.
  • 39. 39 READINGS Deep Learning based Multi-View Learning “Deep Multimodal Autoencoders”, by Ngiam et al., 2011. “Extending Long Short-Term Memory for Multi-View Structured Learning”, by Rajagopalan et al., 2016. “Compressing Recurrent Neural Network With Tensor Train”, by Tjandra et al., 2017. “Deep Canonically Correlated Autoencoders”, by Wang et al., 2015. “Multimodal Tensor Fusion Network”, by Zadeh et al., 2017. “Memory Fusion Network for Multi-View Sequential Learning”, by Zadeh et al., 2018.
  • 40. 40 RESOURCESHistory of Neural Networks http://www.andreykurenkov.com/writing/ai/a-brief-history-of-neural-nets-and-deep-learning/ http://people.idsia.ch/~juergen/firstdeeplearner.html https://www.import.io/post/history-of-deep-learning/ https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html http://www.psych.utoronto.ca/users/reingold/courses/ai/cache/neural4.html https://devblogs.nvidia.com/deep-learning-nutshell-history-training/
  • 41. 41 RESOURCESTransfer Learning “Learning To Learn”, by Thrun and Pratt (Eds), 1998. https://www.springer.com/us/book/9780792380474 “Transfer Learning”, by Torrey and Shavlik, 2009. http://ftp.cs.wisc.edu/machine-learning/shavlik-group/torrey.handbook09.pdf http://ruder.io/transfer-learning/ “A Survey on Transfer Learning”, by Pan and Yang, 2009. https://www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf http://people.idsia.ch/~juergen/metalearner.html “Learning to Remember Rare Events”, by Kaiser et al. 2017. https://arxiv.org/abs/1703.03129
  • 42. 42 RESOURCESPotpourri https://medium.com/data-science-group-iitr/loss-functions-and-optimization-algorithms-demystified-bb92daff331c "Are Loss Functions All the Same?", by Rosasco et al. , 2004. https://www.mitpressjournals.org/doi/10.1162/089976604773135104 “Some Thoughts About The Design Of Loss Functions”, by Hennig and Kutlukaya, 2007. https://www.ine.pt/revstat/pdf/rs070102.pdf http://christopher5106.github.io/deep/learning/2016/09/16/about-loss-functions-multinomial-logistic-logarithm-cross-entropy-square-errors-euclidian-absolute-frobenius-hinge.html “On Loss Functions for Deep Neural Networks in Classification”, by Janocha and Czarnecki, 2017. https://arxiv.org/pdf/1702.05659.pdf “A More General Robust Loss Function”, by Barron, 2018. https://arxiv.org/pdf/1701.03077.pdf