SlideShare a Scribd company logo
Abhishek Koirala Neural Discrete Representation Learning
Neural Discrete Representation Learning
(DeepMind Research)
Paper Review
Authors:
● Aaron van den Oord
● Oriol Vinyals
● Koray Kavukcuoglu
Abhishek Koirala Neural Discrete Representation Learning
What the paper introduces?
Concept of VQ-VAEs
Different from VAEs in two ways
1) Encoder network outputs discrete rather than continuous codes
2) Prior is learnt from the latent distribution rather than static priors
Abhishek Koirala Neural Discrete Representation Learning
AutoEncoders
Limitations of AutoEncoders
● Fixed dimensional latent space
● Cannot generate new samples directly from latent space
○ Due to unstructured and messy latent space
● Difficulty in generating new variety of samples
○ Autoencoders only relies on reconstruction loss which limits variability
Abhishek Koirala Neural Discrete Representation Learning
Variational AutoEncoders (VAE)
● Probabilistic modelling of latent space
● Generally enforcing a prior which is standard normal distribution
● Structured latent space
○ VAEs enforce probabilistic prior resulting in structured latent space
● Regularization and control over latent space
● Sampling based generation
○ Enables generation of new samples by sampling from the
learned latent space distribution
Limitation of VAEs
● Disentanglement of latent features
● Static prior
● Posterior collapse
Joseph Rocca, Understanding Variational AutoEncoders
Abhishek Koirala Neural Discrete Representation Learning
Vector Quantization - Variational AutoEncoders(VQ-VAE)
● Discrete latent representation
● Prior is learnt from the data rather than assuming it as static
● Avoid posterior collapse
Additional Contributions
● Discrete latent model performs as well as its continuous counterpart
● When paired with a powerful prior, the samples are high quality on a wide
range of applications such as image, speech and video generation
Abhishek Koirala Neural Discrete Representation Learning
Vector Quantization - Variational AutoEncoders(VQ-VAE)
Posterior Distribution
Prior Distribution p(z)
Posterior and priors in
VAEs are assumed
normally distributed
In VQ-VAE
● Discrete latent variables
● Novel training method inspired by Vector
Quantization
● Posterior and prior distributions are
categorical
Yang, Y. et. al(2019). Improving the classification effectiveness of intrusion detection by using improved conditional
variational autoencoder and deep neural network. Sensors, 19(11), 2528.
Abhishek Koirala Neural Discrete Representation Learning
Vector Quantization
● Codebook initialization: Create initial set of representative codewords serving as prototype for discrete latent
variables
● Encoding: Map the input data to nearest codeword in codebook, quantizing the continuous values into discrete latent
variables
● Discrete latent variables: Represent the input data using discrete latent variables obtained from encoding step
● Decoding: Reconstruct the data from discrete latent variables by mapping them back to the corresponding codewords
in the codebook
● Training
Abhishek Koirala Neural Discrete Representation Learning
Training VQ-VAE
Forward Pass
● Intend to find the index of
codebook vector which has the
least distance from the encoding
vector
● Fill it in q(z|x) → Posterior
● Reconstruct the decoder i/p using
the posterior indices
BackPropagation
● We cannot compute gradients here because of the discrete latent
variables which are non-differentiable
● Copy the gradients from the decoder input onto the gradient for
encoder output through Straight Through Estimator
Abhishek Koirala Neural Discrete Representation Learning
Training VQ-VAE
Loss Function
Reconstruction loss
VQ Loss Commitment Loss
Enforces reconstruction
closer to input
Enforces embedding
space close to encoder
input
Enforces encoder
output close to
embedding space
Stop Gradients (sg) used in both VQ loss and commitment loss for
updating of embedding space parameters and encoder parameters
respectively.
Decoders → optimizes the reconstruction loss
Encoder → optimizes the reconstruction loss and commitment loss
Embeddings/Codebook → optimized by VQ loss
The term β used in
commitment loss depends
on the scale of
reconstruction loss .
Higher the reconstruction
loss~higher the β value to amplify
the impact of commitment loss
Abhishek Koirala Neural Discrete Representation Learning
Prior
● The prior is kept constant and uniform.
● An autoregressive distribution is fit over z, p(z) to generate x via ancestral sampling
● 2 autoregressive models discussed here
○ PixelCNN over discrete latent for images
○ WaveNet for raw audio
Training of prior and VQ-VAE jointly is left as a future research
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Comparison with continuous variables
CIFAR 10 dataset, ADAM optimizer, 50 samples used in training objective for VIMCO
Models Features Results(bits/dim)
VQ-VAE Discrete latent
representation
4.67
VIMCO Gaussian or categorical
priors
5.14
VAE Continuous variables 4.51
Note: bits/dim measures the average number of bits required to represent each dimension of input data. A lower value indicates
better compression and reconstruction performance
VQ-VAE becomes the first model that challenges the performance of continuous VAEs
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 1
● Model achieves a reduction of approximately 42.6 bits per images
● A powerful prior model called PixelCNN is trained over the discrete latent space to capture global
structures instead of low level image statistics
● Results are slightly blurry but still retain the overall content
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 2
● Trained a PixelCNN prior on the 32*32*1 latent space using spatial masking
● Samples drawn from PixelCNN were mapped to pixel-space with decoder of VQ-VAE
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 3
● Same experiment as 2 for 84*84*3 frames drawn from the DeepMind Lab environment
● Reconstruction looked nearly identical to their originals
Abhishek Koirala Neural Discrete Representation Learning
Experiments
1) Images
Experiment 4
● Training of second VQ-VAE with a PixelCNN decoder on top of 21*21*1 latent space obtained from first vQ-VAE trained
on DM-LAB frames
● Interesting setup, because VAE would suffer from “posterior collapse” due to powerful decoder enough to perfectly
model the input data
● Posterior collapse not observed
Abhishek Koirala Neural Discrete Representation Learning
Experiments
2) Audio
Reconstructions Samples from prior
Aäron van den Oord, Neural Discrete Representation Learning Aäron van den Oord, Neural Discrete Representation Learning
Although the reconstructed waveforms are
different, but the semantic meaning in the audio is
retained without knowing any information about the
language or speaker details
Abhishek Koirala Neural Discrete Representation Learning
Experiments
2) Video
Abhishek Koirala Neural Discrete Representation Learning
Conclusion
● Introduced VQ-VAE: combines VAEs with vector quantization.
● VQ-VAEs capture long-term dependencies in data.
● Successful experiments: generate images, video sequences, and meaningful speech.
● Discrete latent space learns important features without supervision.
● VQ-VAEs achieve comparable likelihoods to continuous latent variable models, model long-range
sequences, and learn speech descriptors related to phonemes in an unsupervised fashion.
Abhishek Koirala Neural Discrete Representation Learning

More Related Content

What's hot (20)

PPTX
TAME: Trainable Attention Mechanism for Explanations
VasileiosMezaris
 
PDF
[논문리뷰] 딥러닝 적용한 EEG 연구 소개
Donghyeon Kim
 
PDF
Variational Autoencoder
Mark Chang
 
PDF
Latent diffusions vs DALL-E v2
Vitaly Bondar
 
PDF
Generative Adversarial Networks
Mustafa Yagmur
 
PDF
Deep Generative Models
Chia-Wen Cheng
 
PDF
Introduction to Generative Adversarial Networks
BennoG1
 
PDF
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
Edge AI and Vision Alliance
 
PDF
Introduction to Chainer 11 may,2018
Preferred Networks
 
PDF
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Joonyoung Yi
 
PDF
Generative adversarial networks
남주 김
 
PPTX
Generative Adversarial Networks (GAN)
Manohar Mukku
 
PPTX
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Luba Elliott
 
PDF
Variational Autoencoders For Image Generation
Jason Anderson
 
PDF
Wasserstein GAN 수학 이해하기 I
Sungbin Lim
 
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
PDF
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Deep Learning JP
 
PDF
Monte carlo dropout and variational bound
天乐 杨
 
PDF
Generating Diverse High-Fidelity Images with VQ-VAE-2
harmonylab
 
PDF
(SURVEY) Semi Supervised Learning
Yamato OKAMOTO
 
TAME: Trainable Attention Mechanism for Explanations
VasileiosMezaris
 
[논문리뷰] 딥러닝 적용한 EEG 연구 소개
Donghyeon Kim
 
Variational Autoencoder
Mark Chang
 
Latent diffusions vs DALL-E v2
Vitaly Bondar
 
Generative Adversarial Networks
Mustafa Yagmur
 
Deep Generative Models
Chia-Wen Cheng
 
Introduction to Generative Adversarial Networks
BennoG1
 
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
Edge AI and Vision Alliance
 
Introduction to Chainer 11 may,2018
Preferred Networks
 
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Joonyoung Yi
 
Generative adversarial networks
남주 김
 
Generative Adversarial Networks (GAN)
Manohar Mukku
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Luba Elliott
 
Variational Autoencoders For Image Generation
Jason Anderson
 
Wasserstein GAN 수학 이해하기 I
Sungbin Lim
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Deep Learning JP
 
Monte carlo dropout and variational bound
天乐 杨
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
harmonylab
 
(SURVEY) Semi Supervised Learning
Yamato OKAMOTO
 

Similar to Neural Discrete Representation Learning - A paper review (20)

PDF
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Hao-Wen (Herman) Dong
 
PPTX
Autoencoders for image_classification
Cenk Bircanoğlu
 
PPTX
Invertible Denoising Network: A Light Solution for Real Noise Removal
ivaderivader
 
PPTX
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
PPTX
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
PDF
Introduction to Artificial Neural Networks - PART II.pdf
SasiKala592103
 
PPTX
False colouring
GauravBiswas9
 
PDF
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
Fundamental of deep learning
Stanley Wang
 
PPTX
Deep neural networks
Si Haem
 
PPTX
WaveNet
AbeyHurtis
 
PDF
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Unit 5 Quantization
Dr Piyush Charan
 
PDF
Master's Thesis - Data Science - Presentation
Giorgio Carbone
 
PDF
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
MYEONGGYU LEE
 
PPTX
Deep learning from a novice perspective
Anirban Santara
 
PDF
Online video object segmentation via convolutional trident network
NAVER Engineering
 
PPTX
U-Netpresentation.pptx
NoorUlHaq47
 
PDF
jefferson-mae Masked Autoencoders based Pretraining
cevesom156
 
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Hao-Wen (Herman) Dong
 
Autoencoders for image_classification
Cenk Bircanoğlu
 
Invertible Denoising Network: A Light Solution for Real Noise Removal
ivaderivader
 
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
Introduction to Artificial Neural Networks - PART II.pdf
SasiKala592103
 
False colouring
GauravBiswas9
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Fundamental of deep learning
Stanley Wang
 
Deep neural networks
Si Haem
 
WaveNet
AbeyHurtis
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Unit 5 Quantization
Dr Piyush Charan
 
Master's Thesis - Data Science - Presentation
Giorgio Carbone
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
(Paper Review) Reconstruction of Monte Carlo Image Sequences using a Recurren...
MYEONGGYU LEE
 
Deep learning from a novice perspective
Anirban Santara
 
Online video object segmentation via convolutional trident network
NAVER Engineering
 
U-Netpresentation.pptx
NoorUlHaq47
 
jefferson-mae Masked Autoencoders based Pretraining
cevesom156
 
Ad

Recently uploaded (20)

PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PDF
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
PPTX
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Ad

Neural Discrete Representation Learning - A paper review

  • 1. Abhishek Koirala Neural Discrete Representation Learning Neural Discrete Representation Learning (DeepMind Research) Paper Review Authors: ● Aaron van den Oord ● Oriol Vinyals ● Koray Kavukcuoglu
  • 2. Abhishek Koirala Neural Discrete Representation Learning What the paper introduces? Concept of VQ-VAEs Different from VAEs in two ways 1) Encoder network outputs discrete rather than continuous codes 2) Prior is learnt from the latent distribution rather than static priors
  • 3. Abhishek Koirala Neural Discrete Representation Learning AutoEncoders Limitations of AutoEncoders ● Fixed dimensional latent space ● Cannot generate new samples directly from latent space ○ Due to unstructured and messy latent space ● Difficulty in generating new variety of samples ○ Autoencoders only relies on reconstruction loss which limits variability
  • 4. Abhishek Koirala Neural Discrete Representation Learning Variational AutoEncoders (VAE) ● Probabilistic modelling of latent space ● Generally enforcing a prior which is standard normal distribution ● Structured latent space ○ VAEs enforce probabilistic prior resulting in structured latent space ● Regularization and control over latent space ● Sampling based generation ○ Enables generation of new samples by sampling from the learned latent space distribution Limitation of VAEs ● Disentanglement of latent features ● Static prior ● Posterior collapse Joseph Rocca, Understanding Variational AutoEncoders
  • 5. Abhishek Koirala Neural Discrete Representation Learning Vector Quantization - Variational AutoEncoders(VQ-VAE) ● Discrete latent representation ● Prior is learnt from the data rather than assuming it as static ● Avoid posterior collapse Additional Contributions ● Discrete latent model performs as well as its continuous counterpart ● When paired with a powerful prior, the samples are high quality on a wide range of applications such as image, speech and video generation
  • 6. Abhishek Koirala Neural Discrete Representation Learning Vector Quantization - Variational AutoEncoders(VQ-VAE) Posterior Distribution Prior Distribution p(z) Posterior and priors in VAEs are assumed normally distributed In VQ-VAE ● Discrete latent variables ● Novel training method inspired by Vector Quantization ● Posterior and prior distributions are categorical Yang, Y. et. al(2019). Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors, 19(11), 2528.
  • 7. Abhishek Koirala Neural Discrete Representation Learning Vector Quantization ● Codebook initialization: Create initial set of representative codewords serving as prototype for discrete latent variables ● Encoding: Map the input data to nearest codeword in codebook, quantizing the continuous values into discrete latent variables ● Discrete latent variables: Represent the input data using discrete latent variables obtained from encoding step ● Decoding: Reconstruct the data from discrete latent variables by mapping them back to the corresponding codewords in the codebook ● Training
  • 8. Abhishek Koirala Neural Discrete Representation Learning Training VQ-VAE Forward Pass ● Intend to find the index of codebook vector which has the least distance from the encoding vector ● Fill it in q(z|x) → Posterior ● Reconstruct the decoder i/p using the posterior indices BackPropagation ● We cannot compute gradients here because of the discrete latent variables which are non-differentiable ● Copy the gradients from the decoder input onto the gradient for encoder output through Straight Through Estimator
  • 9. Abhishek Koirala Neural Discrete Representation Learning Training VQ-VAE Loss Function Reconstruction loss VQ Loss Commitment Loss Enforces reconstruction closer to input Enforces embedding space close to encoder input Enforces encoder output close to embedding space Stop Gradients (sg) used in both VQ loss and commitment loss for updating of embedding space parameters and encoder parameters respectively. Decoders → optimizes the reconstruction loss Encoder → optimizes the reconstruction loss and commitment loss Embeddings/Codebook → optimized by VQ loss The term β used in commitment loss depends on the scale of reconstruction loss . Higher the reconstruction loss~higher the β value to amplify the impact of commitment loss
  • 10. Abhishek Koirala Neural Discrete Representation Learning Prior ● The prior is kept constant and uniform. ● An autoregressive distribution is fit over z, p(z) to generate x via ancestral sampling ● 2 autoregressive models discussed here ○ PixelCNN over discrete latent for images ○ WaveNet for raw audio Training of prior and VQ-VAE jointly is left as a future research
  • 11. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Comparison with continuous variables CIFAR 10 dataset, ADAM optimizer, 50 samples used in training objective for VIMCO Models Features Results(bits/dim) VQ-VAE Discrete latent representation 4.67 VIMCO Gaussian or categorical priors 5.14 VAE Continuous variables 4.51 Note: bits/dim measures the average number of bits required to represent each dimension of input data. A lower value indicates better compression and reconstruction performance VQ-VAE becomes the first model that challenges the performance of continuous VAEs
  • 12. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 1 ● Model achieves a reduction of approximately 42.6 bits per images ● A powerful prior model called PixelCNN is trained over the discrete latent space to capture global structures instead of low level image statistics ● Results are slightly blurry but still retain the overall content
  • 13. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 2 ● Trained a PixelCNN prior on the 32*32*1 latent space using spatial masking ● Samples drawn from PixelCNN were mapped to pixel-space with decoder of VQ-VAE
  • 14. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 3 ● Same experiment as 2 for 84*84*3 frames drawn from the DeepMind Lab environment ● Reconstruction looked nearly identical to their originals
  • 15. Abhishek Koirala Neural Discrete Representation Learning Experiments 1) Images Experiment 4 ● Training of second VQ-VAE with a PixelCNN decoder on top of 21*21*1 latent space obtained from first vQ-VAE trained on DM-LAB frames ● Interesting setup, because VAE would suffer from “posterior collapse” due to powerful decoder enough to perfectly model the input data ● Posterior collapse not observed
  • 16. Abhishek Koirala Neural Discrete Representation Learning Experiments 2) Audio Reconstructions Samples from prior Aäron van den Oord, Neural Discrete Representation Learning Aäron van den Oord, Neural Discrete Representation Learning Although the reconstructed waveforms are different, but the semantic meaning in the audio is retained without knowing any information about the language or speaker details
  • 17. Abhishek Koirala Neural Discrete Representation Learning Experiments 2) Video
  • 18. Abhishek Koirala Neural Discrete Representation Learning Conclusion ● Introduced VQ-VAE: combines VAEs with vector quantization. ● VQ-VAEs capture long-term dependencies in data. ● Successful experiments: generate images, video sequences, and meaningful speech. ● Discrete latent space learns important features without supervision. ● VQ-VAEs achieve comparable likelihoods to continuous latent variable models, model long-range sequences, and learn speech descriptors related to phonemes in an unsupervised fashion.
  • 19. Abhishek Koirala Neural Discrete Representation Learning