Self-supervised Learning for ECG-based
Emotion Recognition
Pritam Sarkar, Ali Etemad
Department of Electrical and Computer Engineering
Queen’s University, Kingston, Canada
ICASSP 2020
2
❑ Problem and Motivation
❑ Related work
❑ Proposed Framework
❑ Datasets
❑ Results
❑ Analysis
❑ Summary
Outline
3
Problem and Motivation
Limitations of fully-supervised learning:
❑ Human annotated labels are required to learn data
representations; the learned representations are
often very task specific.
❑ Larger labelled data are required in order to train
deep networks; smaller datasets often result in
poor performance.
Advantages of self-supervised learning:
❑ Models are trained using automatically generated
labels.
❑ Learned representations are high-level and
generalized; therefore less sensitive to inter or intra
instance variations (local transformations).
❑ Larger datasets can be acquired to train deeper and
sophisticated networks.
4
Problem and Motivation
Limitations of fully-supervised learning:
❑ Human annotated labels are required to learn data
representations; the learned representations are
often very task specific.
❑ Larger labelled data are required in order to train
deep networks; smaller datasets often result in
poor performance.
Advantages of self-supervised learning:
❑ Models are trained using automatically generated
labels.
❑ Learned representations are high-level and
generalized; therefore less sensitive to inter or intra
instance variations (local transformations).
❑ Larger datasets can be acquired to train deeper and
sophisticated networks.
5
Literature Review
❑ Healey et al., 2005:
➢ Stress detection during driving task
➢ Time-frequency domain features
➢ LDA classifier
❑ Liu et al., 2009:
➢ Affect based gaming experience
➢ Time-frequency domain features
➢ RF, KNN, BN, SVM classifiers
❑ Santamaria et al., 2018:
➢ Movie clips were used to elicit emotional state
➢ Time/frequency domain features
➢ Deep CNN classifier
❑ Siddharth et al., 2019:
➢ Affect recognition
➢ HRV and spectrogram features
➢ Extreme learning machine classifier
Time/Frequency
Domain
Feature Extraction
Fully-supervised
Classifier
Emotion Recognition
6
Proposed Framework
Stage 1: Pretext Task
Stage 2: Downstream Task
Transformation Multi-task Self-supervised Network
Emotion Recognition
Pseudo
Labels
Learned ECG Representation
Unlabelled
ECG
Transformed
ECG
ECG
Our proposed framework.
Affective
ECG
7
❑ Noise Addition [SNR]
❑ Scaling [scaling factor]
❑ Negation
❑ Temporal Inversion
❑ Permutation [no. of segments]
❑ Time-warping [no. of segments,
stretching factor]
Transformations
A sample of an original ECG signal with the six transformed
signals along with automatically generated labels are presented.
8
Proposed Architecture
The proposed self-supervised architecture is
presented.
9
Datasets
We use 2 public datasets: AMIGOS and SWELL
❑ AMIGOS:
➢ Affect attributes: Arousal, Valence
➢ Total Participants: 40
➢ Movie clips were shown to participants.
➢ Shimmer sensors were used to capture ECG signal at 256 Hz.
❑ SWELL:
➢ Affect attributes: Arousal, Valence, Stress
➢ Total Participants: 25
➢ Participants performed office tasks.
➢ TMSI devices were used to capture ECG signal at 2048 Hz.
10
Results
11
it se su ervision
it out se su ervision
Analysis
Performance of our method with and without the self-supervised learning step using
1% of the labels in the datasets are presented.
12
Summary
❑ We proposed a novel ECG-based self-supervised learning framework for affective computing for
the first time.
❑ We achieved state-of-the-art results on 2 public datasets (AMIGOS and SWELL).
❑ We showed that for a very limited amount of labelled data our self-supervised model perform
considerably better compared to the fully-supervised model.
13
Thank you!
If you have any questions please reach me at:
pritam.sarkar@queensu.ca
www.pritamsarkar.com

Self-supervised Learning for ECG-based Emotion Recognition

  • 1.
    Self-supervised Learning forECG-based Emotion Recognition Pritam Sarkar, Ali Etemad Department of Electrical and Computer Engineering Queen’s University, Kingston, Canada ICASSP 2020
  • 2.
    2 ❑ Problem andMotivation ❑ Related work ❑ Proposed Framework ❑ Datasets ❑ Results ❑ Analysis ❑ Summary Outline
  • 3.
    3 Problem and Motivation Limitationsof fully-supervised learning: ❑ Human annotated labels are required to learn data representations; the learned representations are often very task specific. ❑ Larger labelled data are required in order to train deep networks; smaller datasets often result in poor performance. Advantages of self-supervised learning: ❑ Models are trained using automatically generated labels. ❑ Learned representations are high-level and generalized; therefore less sensitive to inter or intra instance variations (local transformations). ❑ Larger datasets can be acquired to train deeper and sophisticated networks.
  • 4.
    4 Problem and Motivation Limitationsof fully-supervised learning: ❑ Human annotated labels are required to learn data representations; the learned representations are often very task specific. ❑ Larger labelled data are required in order to train deep networks; smaller datasets often result in poor performance. Advantages of self-supervised learning: ❑ Models are trained using automatically generated labels. ❑ Learned representations are high-level and generalized; therefore less sensitive to inter or intra instance variations (local transformations). ❑ Larger datasets can be acquired to train deeper and sophisticated networks.
  • 5.
    5 Literature Review ❑ Healeyet al., 2005: ➢ Stress detection during driving task ➢ Time-frequency domain features ➢ LDA classifier ❑ Liu et al., 2009: ➢ Affect based gaming experience ➢ Time-frequency domain features ➢ RF, KNN, BN, SVM classifiers ❑ Santamaria et al., 2018: ➢ Movie clips were used to elicit emotional state ➢ Time/frequency domain features ➢ Deep CNN classifier ❑ Siddharth et al., 2019: ➢ Affect recognition ➢ HRV and spectrogram features ➢ Extreme learning machine classifier Time/Frequency Domain Feature Extraction Fully-supervised Classifier Emotion Recognition
  • 6.
    6 Proposed Framework Stage 1:Pretext Task Stage 2: Downstream Task Transformation Multi-task Self-supervised Network Emotion Recognition Pseudo Labels Learned ECG Representation Unlabelled ECG Transformed ECG ECG Our proposed framework. Affective ECG
  • 7.
    7 ❑ Noise Addition[SNR] ❑ Scaling [scaling factor] ❑ Negation ❑ Temporal Inversion ❑ Permutation [no. of segments] ❑ Time-warping [no. of segments, stretching factor] Transformations A sample of an original ECG signal with the six transformed signals along with automatically generated labels are presented.
  • 8.
    8 Proposed Architecture The proposedself-supervised architecture is presented.
  • 9.
    9 Datasets We use 2public datasets: AMIGOS and SWELL ❑ AMIGOS: ➢ Affect attributes: Arousal, Valence ➢ Total Participants: 40 ➢ Movie clips were shown to participants. ➢ Shimmer sensors were used to capture ECG signal at 256 Hz. ❑ SWELL: ➢ Affect attributes: Arousal, Valence, Stress ➢ Total Participants: 25 ➢ Participants performed office tasks. ➢ TMSI devices were used to capture ECG signal at 2048 Hz.
  • 10.
  • 11.
    11 it se suervision it out se su ervision Analysis Performance of our method with and without the self-supervised learning step using 1% of the labels in the datasets are presented.
  • 12.
    12 Summary ❑ We proposeda novel ECG-based self-supervised learning framework for affective computing for the first time. ❑ We achieved state-of-the-art results on 2 public datasets (AMIGOS and SWELL). ❑ We showed that for a very limited amount of labelled data our self-supervised model perform considerably better compared to the fully-supervised model.
  • 13.
    13 Thank you! If youhave any questions please reach me at: [email protected] www.pritamsarkar.com