Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
論文紹介
A Transformer-based Framework for Multivariate
Time Series Representation Learning
北海道大学 大学院情報科学研究院
情報理工学部門 複合情報工学分野 調和系工学研究室
劉兆邦
2022年06月20日
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
• 著者
– George Zerveas, Srideepika Jayaraman, Dhaval Patel,
Anuradha Bhamidipaty, Carsten Eickhoff
• 発表
– Proceedings of the 27th ACM SIGKDD Conference on
Knowledge Discovery & Data Mining
• 論文リンク
– https://dl.acm.org/doi/abs/10.1145/3447548.3467401?casa_t
oken=HbWWl3ksNy4AAAAA:watSSa0fom_EbxcyDmj8vMTSm
hxjuj0XzZ5lpJYCtzSIEvwys4my5p8ksSsfSLsdfZAPpQokiQEo
Paper information 2
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
• A novel framework for multivariate time series representation learning
based on the transformer encoder architecture
• The framework includes an unsupervised pre-training scheme, which
can offer substantial performance benefits over fully supervised
learning on downstream tasks
• Performs significantly better than the best currently available methods
for regression and classification
• The first unsupervised method shown to push the limits of state-of-
the-art performance for multivariate time series regression and
classification
Abstract 3
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Unlike in domains such as Computer Vision or Natural Language
Processing (NLP), the dominance of deep learning for time series
is far from established
Non-deep learning methods such as TS-CHIEF, HIVE-COTE, and ROCKET
currently hold the record on time series regression and classification
dataset benchmarks
Transformer models are based on a multi-headed attention mechanism
that renders them particularly suitable for time series data
Develop a generally applicable methodology (framework) that can
leverage unlabeled data by first training a transformer encoder to extract
dense vector representations of multivariate time series through an input
“denoising” (autoregressive) objective.
Introduction 4
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
5
Methodology-Base model
However, the decoder module needs the (masked) “ground truth” output sequence as
an input, and is thus unsuitable for tasks such as classification or (extrinsic) regression.
[1]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information
processing systems, 2017, 30.
Encoder
Decoder
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Methodology-Base model 6
各時系列の長さ
はw,時系列変
数はm個ある
線形変化
畳み込み
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Methodology-Base model 7
Positional encodings
Based on the performance of our models, we also observe that the positional
encodings generally appear not to significantly interfere with the numerical
information of the time series,
Padding
• After setting a maximum sequence length 𝑤 for the entire dataset, shorter
samples are padded with arbitrary values
• Generate a padding mask which adds a large negative value to the attention
scores for the padded positions, before computing the self-attention distribution
with the softmax function
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Methodology-Regression and classification 8
出力のzを一つのベクトルと連結して、線形変化層の中に
入って、モデルをレグレッションと分類タスクに変更する
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Methodology-Unsupervised pre-training 9
We set part of the input to 0 and ask the
model to predict the masked values
A binary noise mask, is created
independently for each training sample and
epoch, and the input is masked by
elementwise multiplication:
各行に0のsegmentsの長さの平均値
各行1のsegmentsの長さの平均値
Masking 割合
𝑟 ∗ 𝑚 各列Maskingした変数の平均値
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
10
Methodology-Unsupervised pre-training
We chose this masking pattern because it encourages the model to learn to attend
both to preceding and succeeding segments in individual variables, as well as to
existing contemporary values of the other variables in the time series, and thereby
to learn to model inter-dependencies between variables.
Maskingした部分
のLossだけを計算
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
11
Experiments & Results-Regression
TST (Time Series Transformer)
• proposed approach achieves an average rank of 1.33
• pre-trained transformer models outperform the fully
supervised ones in 3 out of 6 datasets
‒ no additional samples are used for pretraining
average relative difference from mean
低いと、平均的な
効果が良い(平均
RMSEとの差)
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
12
Experiments & Results-Regression
Q1: Given a partially labeled dataset of a certain size, how will additional
labels affect performance?
• As expected, with an increasing proportion of available labels performance improves both
for a fully supervised model, as well as the same model that has been first pre-trained on
the entire training set through the unsupervised objective and then fine-tuned
• not only does the pretrained model outperform the fully supervised one, but the
benefit persists throughout the entire range of label availability, even when the
models are allowed to use all labels
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
13
Experiments & Results-Regression
Q2: Given a labeled dataset, how will additional unlabeled samples
affect performance?
• for a given number of labels (shown as a percentage of the totally available labels),
the more data samples are used for unsupervised learning, the lower the error
achieved
• reusing a subset of the same samples for unsupervised pretraining improves
performance
fully supervised
training only
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
14
Experiments & Results-Classification
• performed best on 7 out of the 11 datasets, achieving an average rank of 1.7
• We believe that this indicates a relative weakness of our current models when
dealing with very low dimensional time series(3-dimensional)
• Finally, we observe that the pre-trained transformer models performed better
than the fully supervised ones in 8 out of 11 datasets,sometimes by a substantial
margin.
‾ suggesting the benefit to originate from merely reusing the same samples in a
different training task
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
15
Additional points
Execution time on Tesla P100 GPU
In practice,despite allowing for many hundreds of epochs, using a GPU we never
trained our models longer than 3 hours on any of the examined datasets
Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved.
Conclusion 16
➢ Propose a transformer-based framework for unsupervised representation
learning of multivariate time series
➢ Employing unsupervised learning of multivariate time series that
surpasses the performance of all current state-of-the-art supervised
methods
➢ Unsupervised pre-training of our transformer models offers a substantial
performance benefit over fully supervised learning, even without
leveraging additional unlabeled data,
➢ the proposed framework can be readily used for additional downstream
tasks, such as forecasting, clustering and missing value imputation

More Related Content

PDF
Cosine Based Softmax による Metric Learning が上手くいく理由
PPTX
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
PDF
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
PPTX
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-D...
PPTX
【DL輪読会】Scaling Laws for Neural Language Models
PDF
SSII2022 [OS1-01] AI時代のチームビルディング
PDF
Masked Autoencoders Are Scalable Vision Learners
PPTX
【論文読み会】Moser Flow: Divergence-based Generative Modeling on Manifolds
Cosine Based Softmax による Metric Learning が上手くいく理由
[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-D...
【DL輪読会】Scaling Laws for Neural Language Models
SSII2022 [OS1-01] AI時代のチームビルディング
Masked Autoencoders Are Scalable Vision Learners
【論文読み会】Moser Flow: Divergence-based Generative Modeling on Manifolds

What's hot (20)

PDF
深層生成モデルと世界モデル(2020/11/20版)
PPTX
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
PDF
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
PDF
不均衡データのクラス分類
PDF
生成モデルの Deep Learning
PDF
AutoEncoderで特徴抽出
PPTX
【DL輪読会】BlobGAN: Spatially Disentangled Scene Representations
PDF
4 データ間の距離と類似度
PPTX
[DL輪読会]医用画像解析におけるセグメンテーション
PDF
【DL輪読会】GAN-Supervised Dense Visual Alignment (CVPR 2022)
PPTX
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
PDF
ドメイン適応の原理と応用
PDF
Interspeech2022 参加報告
PDF
SSII2020TS: Event-Based Camera の基礎と ニューラルネットワークによる信号処理 〜 生き物のように「変化」を捉えるビジョンセ...
PDF
【メタサーベイ】数式ドリブン教師あり学習
PDF
NIP2015読み会「End-To-End Memory Networks」
PDF
Generative Models(メタサーベイ )
PDF
【DL輪読会】Bridge-Prompt: Toward Ordinal Action Understanding in Instructional Vi...
PPTX
モデル高速化百選
PDF
異常音検知に対する深層学習適用事例
深層生成モデルと世界モデル(2020/11/20版)
[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
不均衡データのクラス分類
生成モデルの Deep Learning
AutoEncoderで特徴抽出
【DL輪読会】BlobGAN: Spatially Disentangled Scene Representations
4 データ間の距離と類似度
[DL輪読会]医用画像解析におけるセグメンテーション
【DL輪読会】GAN-Supervised Dense Visual Alignment (CVPR 2022)
[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...
ドメイン適応の原理と応用
Interspeech2022 参加報告
SSII2020TS: Event-Based Camera の基礎と ニューラルネットワークによる信号処理 〜 生き物のように「変化」を捉えるビジョンセ...
【メタサーベイ】数式ドリブン教師あり学習
NIP2015読み会「End-To-End Memory Networks」
Generative Models(メタサーベイ )
【DL輪読会】Bridge-Prompt: Toward Ordinal Action Understanding in Instructional Vi...
モデル高速化百選
異常音検知に対する深層学習適用事例
Ad

Similar to A Transformer-based Framework for Multivariate Time Series Representation Learning (20)

PPTX
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Est...
PPTX
A Learning-based Iterative Method for Solving Vehicle Routing Problems
PDF
Forecasting across time series databases using recurrent neural networks on g...
PDF
A hybrid model for building energy consumption forecasting using long short t...
PDF
Intention Nets: Psychology-Inspired User Choice Behavior Modeling for Next-Ba...
PPTX
Learning to Incetivize Other Learning Agents
PDF
Deep High Resolution Representation Learning for Human Pose Estimation
PPTX
Anomaly Detection for an E-commerce Pricing System
PPTX
Disentangling semantics and syntax in sentence embeddings with pre trained la...
PPTX
Personalized outfit recommendation with learnable anchors
PDF
IRJET- Placement Portal and Prediction System
PDF
Hybrid-Training & Placement Management with Prediction System
PDF
M.Tech digital-communication-and-vlsi-design
PDF
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...
PPTX
Tell Me What They’re Holding: Weakly Supervised Object Detection with Transfe...
PPTX
From street photos to fashion trends leveraging user provided noisy labels fo...
PDF
IRJET- Placement Recommender and Evaluator
DOCX
STRATAGIES FOR DETECTING DATA POISONING IN DISTRIBUTED ML.docx
DOCX
STRATAGIES FOR DETECTING DATA POISONING IN DISTRIBUTED ML (1).docx
PDF
IRJET - Student Future Prediction System under Filtering Mechanism
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Est...
A Learning-based Iterative Method for Solving Vehicle Routing Problems
Forecasting across time series databases using recurrent neural networks on g...
A hybrid model for building energy consumption forecasting using long short t...
Intention Nets: Psychology-Inspired User Choice Behavior Modeling for Next-Ba...
Learning to Incetivize Other Learning Agents
Deep High Resolution Representation Learning for Human Pose Estimation
Anomaly Detection for an E-commerce Pricing System
Disentangling semantics and syntax in sentence embeddings with pre trained la...
Personalized outfit recommendation with learnable anchors
IRJET- Placement Portal and Prediction System
Hybrid-Training & Placement Management with Prediction System
M.Tech digital-communication-and-vlsi-design
Vibration Analysis for condition Monitoring & Predictive Maintenance using Em...
Tell Me What They’re Holding: Weakly Supervised Object Detection with Transfe...
From street photos to fashion trends leveraging user provided noisy labels fo...
IRJET- Placement Recommender and Evaluator
STRATAGIES FOR DETECTING DATA POISONING IN DISTRIBUTED ML.docx
STRATAGIES FOR DETECTING DATA POISONING IN DISTRIBUTED ML (1).docx
IRJET - Student Future Prediction System under Filtering Mechanism
Ad

More from harmonylab (20)

PDF
【卒業論文】LLMを用いたMulti-Agent-Debateにおける反論の効果に関する研究
PDF
【卒業論文】深層学習によるログ異常検知モデルを用いたサイバー攻撃検知に関する研究
PDF
【卒業論文】LLMを用いたエージェントの相互作用による俳句の生成と評価に関する研究
PPTX
【修士論文】帝国議会および国会議事速記録における可能表現の長期的変遷に関する研究
PPTX
【修士論文】競輪における注目レース選定とLLMを用いたレース紹介記事生成に関する研究
PDF
【卒業論文】ステレオカメラによる車両制御における深層学習の適用に関する研究(A Study on Application of Deep Learning...
PDF
A Study on the Method for Generating Deformed Route Maps for Supporting Detou...
PPTX
【修士論文】LLMを用いた俳句推敲と批評文生成に関する研究
PDF
【修士論文】視覚言語モデルを用いた衣服画像ペアの比較文章生成に関する研究(A Study on the Generation of Comparative...
PPTX
【DLゼミ】Generative Image Dynamics, CVPR2024
PDF
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Tr...
PDF
Generating Automatic Feedback on UI Mockups with Large Language Models
PDF
【DLゼミ】XFeat: Accelerated Features for Lightweight Image Matching
PPTX
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
PPTX
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
PPTX
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
PPTX
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
PPTX
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
PPTX
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
PPTX
DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
【卒業論文】LLMを用いたMulti-Agent-Debateにおける反論の効果に関する研究
【卒業論文】深層学習によるログ異常検知モデルを用いたサイバー攻撃検知に関する研究
【卒業論文】LLMを用いたエージェントの相互作用による俳句の生成と評価に関する研究
【修士論文】帝国議会および国会議事速記録における可能表現の長期的変遷に関する研究
【修士論文】競輪における注目レース選定とLLMを用いたレース紹介記事生成に関する研究
【卒業論文】ステレオカメラによる車両制御における深層学習の適用に関する研究(A Study on Application of Deep Learning...
A Study on the Method for Generating Deformed Route Maps for Supporting Detou...
【修士論文】LLMを用いた俳句推敲と批評文生成に関する研究
【修士論文】視覚言語モデルを用いた衣服画像ペアの比較文章生成に関する研究(A Study on the Generation of Comparative...
【DLゼミ】Generative Image Dynamics, CVPR2024
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Tr...
Generating Automatic Feedback on UI Mockups with Large Language Models
【DLゼミ】XFeat: Accelerated Features for Lightweight Image Matching
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone

Recently uploaded (20)

PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
SaaS reusability assessment using machine learning techniques
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Human Computer Interaction Miterm Lesson
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Early detection and classification of bone marrow changes in lumbar vertebrae...
Rapid Prototyping: A lecture on prototyping techniques for interface design
Basics of Cloud Computing - Cloud Ecosystem
4 layer Arch & Reference Arch of IoT.pdf
LMS bot: enhanced learning management systems for improved student learning e...
Connector Corner: Transform Unstructured Documents with Agentic Automation
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Module 1 Introduction to Web Programming .pptx
SaaS reusability assessment using machine learning techniques
MuleSoft-Compete-Deck for midddleware integrations
Electrocardiogram sequences data analytics and classification using unsupervi...
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Human Computer Interaction Miterm Lesson
EIS-Webinar-Regulated-Industries-2025-08.pdf

A Transformer-based Framework for Multivariate Time Series Representation Learning

  • 1. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 論文紹介 A Transformer-based Framework for Multivariate Time Series Representation Learning 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 調和系工学研究室 劉兆邦 2022年06月20日
  • 2. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. • 著者 – George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, Carsten Eickhoff • 発表 – Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining • 論文リンク – https://dl.acm.org/doi/abs/10.1145/3447548.3467401?casa_t oken=HbWWl3ksNy4AAAAA:watSSa0fom_EbxcyDmj8vMTSm hxjuj0XzZ5lpJYCtzSIEvwys4my5p8ksSsfSLsdfZAPpQokiQEo Paper information 2
  • 3. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. • A novel framework for multivariate time series representation learning based on the transformer encoder architecture • The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits over fully supervised learning on downstream tasks • Performs significantly better than the best currently available methods for regression and classification • The first unsupervised method shown to push the limits of state-of- the-art performance for multivariate time series regression and classification Abstract 3
  • 4. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Unlike in domains such as Computer Vision or Natural Language Processing (NLP), the dominance of deep learning for time series is far from established Non-deep learning methods such as TS-CHIEF, HIVE-COTE, and ROCKET currently hold the record on time series regression and classification dataset benchmarks Transformer models are based on a multi-headed attention mechanism that renders them particularly suitable for time series data Develop a generally applicable methodology (framework) that can leverage unlabeled data by first training a transformer encoder to extract dense vector representations of multivariate time series through an input “denoising” (autoregressive) objective. Introduction 4
  • 5. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 5 Methodology-Base model However, the decoder module needs the (masked) “ground truth” output sequence as an input, and is thus unsuitable for tasks such as classification or (extrinsic) regression. [1]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30. Encoder Decoder
  • 6. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Methodology-Base model 6 各時系列の長さ はw,時系列変 数はm個ある 線形変化 畳み込み
  • 7. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Methodology-Base model 7 Positional encodings Based on the performance of our models, we also observe that the positional encodings generally appear not to significantly interfere with the numerical information of the time series, Padding • After setting a maximum sequence length 𝑤 for the entire dataset, shorter samples are padded with arbitrary values • Generate a padding mask which adds a large negative value to the attention scores for the padded positions, before computing the self-attention distribution with the softmax function
  • 8. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Methodology-Regression and classification 8 出力のzを一つのベクトルと連結して、線形変化層の中に 入って、モデルをレグレッションと分類タスクに変更する
  • 9. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Methodology-Unsupervised pre-training 9 We set part of the input to 0 and ask the model to predict the masked values A binary noise mask, is created independently for each training sample and epoch, and the input is masked by elementwise multiplication: 各行に0のsegmentsの長さの平均値 各行1のsegmentsの長さの平均値 Masking 割合 𝑟 ∗ 𝑚 各列Maskingした変数の平均値
  • 10. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 10 Methodology-Unsupervised pre-training We chose this masking pattern because it encourages the model to learn to attend both to preceding and succeeding segments in individual variables, as well as to existing contemporary values of the other variables in the time series, and thereby to learn to model inter-dependencies between variables. Maskingした部分 のLossだけを計算
  • 11. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 11 Experiments & Results-Regression TST (Time Series Transformer) • proposed approach achieves an average rank of 1.33 • pre-trained transformer models outperform the fully supervised ones in 3 out of 6 datasets ‒ no additional samples are used for pretraining average relative difference from mean 低いと、平均的な 効果が良い(平均 RMSEとの差)
  • 12. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 12 Experiments & Results-Regression Q1: Given a partially labeled dataset of a certain size, how will additional labels affect performance? • As expected, with an increasing proportion of available labels performance improves both for a fully supervised model, as well as the same model that has been first pre-trained on the entire training set through the unsupervised objective and then fine-tuned • not only does the pretrained model outperform the fully supervised one, but the benefit persists throughout the entire range of label availability, even when the models are allowed to use all labels
  • 13. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 13 Experiments & Results-Regression Q2: Given a labeled dataset, how will additional unlabeled samples affect performance? • for a given number of labels (shown as a percentage of the totally available labels), the more data samples are used for unsupervised learning, the lower the error achieved • reusing a subset of the same samples for unsupervised pretraining improves performance fully supervised training only
  • 14. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 14 Experiments & Results-Classification • performed best on 7 out of the 11 datasets, achieving an average rank of 1.7 • We believe that this indicates a relative weakness of our current models when dealing with very low dimensional time series(3-dimensional) • Finally, we observe that the pre-trained transformer models performed better than the fully supervised ones in 8 out of 11 datasets,sometimes by a substantial margin. ‾ suggesting the benefit to originate from merely reusing the same samples in a different training task
  • 15. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. 15 Additional points Execution time on Tesla P100 GPU In practice,despite allowing for many hundreds of epochs, using a GPU we never trained our models longer than 3 hours on any of the examined datasets
  • 16. Copyright © 2020 調和系工学研究室 - 北海道大学 大学院情報科学研究院 情報理工学部門 複合情報工学分野 – All rights reserved. Conclusion 16 ➢ Propose a transformer-based framework for unsupervised representation learning of multivariate time series ➢ Employing unsupervised learning of multivariate time series that surpasses the performance of all current state-of-the-art supervised methods ➢ Unsupervised pre-training of our transformer models offers a substantial performance benefit over fully supervised learning, even without leveraging additional unlabeled data, ➢ the proposed framework can be readily used for additional downstream tasks, such as forecasting, clustering and missing value imputation