SlideShare a Scribd company logo
Variational Image Compression
with a Scale Hyperprior
Hyeongmin Lee
PR-395
2022.7.17
Entropy Coding
Entropy Coding
 Image Compression
0011010100111...
Entropy Coding
 Entropy Coding
A,B,C,D: 4 Letters (need at least 2 bits per letter)
Sample Bits
A 00
B 10
C 01
D 11
100100001111110101000100010001
(30 bits)
https://www.programiz.com/dsa/huffman-coding
Entropy Coding
 Entropy Coding
A,B,C,D: 4 Letters (need at least 2 bits per letter)
Sample Bits
A 11
B 100
C 0
D 101
1000111110110110100110110110
(28 bits)
Lower Bound??
Huffman Coding
Entropy Coding
 Entropy
𝑯𝑯 = − � 𝒑𝒑𝒊𝒊𝒍𝒍𝒍𝒍𝒈𝒈𝟐𝟐𝒑𝒑𝒊𝒊
Sample 𝒑𝒑𝒊𝒊
A 5/15
B 1/15
C 6/15
D 3/15
Image Compression
Image Compression
 Image Compression
0011010100111...
Lossless Coding: bmp  Coding the Integers Itself
Lossy Coding: JPEG  Reducing the ‘Entropy’ of samples
Image Compression
 JPEG
8x8 cutting
Discrete Cosine Transform Quantization
Entropy Coding...
https://www.whydomath.org/node/wavlets/basicjpg.html
Image Compression
 JPEG
High Quality
High Entropy
High Bitrate
Low Quality
Low Entropy
Low Bitrate
Image Compression
 Bitrate-Distortion Tradeoff
Deep Learning-based
Image Compression
Deep Learning-based Image Compression
 Image Compression Pipeline
Transform Quantization
Entropy Coding
Decoding
Inverse
Transform
Deep Learning-based Image Compression
 Image Compression Pipeline
𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆
Deep Learning-based Image Compression
 Auto Encoder
�
𝑦𝑦 = 𝑄𝑄(𝑦𝑦)
𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔
�
𝑥𝑥 = 𝑔𝑔𝑠𝑠(�
𝑦𝑦; 𝜃𝜃𝑔𝑔)
𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
[− log2 𝑝𝑝�
𝑦𝑦(�
𝑦𝑦)]
𝐷𝐷 = 𝑥𝑥 − �
𝑥𝑥 2
2
???
Problem of Quantization
Problem of Quantization
 Adding Uniform Noise [PR328]
�
𝑦𝑦 = 𝑄𝑄 𝑦𝑦 �
𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛, 𝑛𝑛~𝑈𝑈(−
1
2
,
1
2
)
Ballé, Johannes, Valero Laparra, and Eero P. Simoncelli. "End-to-end optimized image compression." 5th International Conference on Learning Representations, ICLR 2017.
𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
− log2 𝑝𝑝�
𝑦𝑦 �
𝑦𝑦 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
[− log2 𝑝𝑝�
𝑦𝑦(𝑄𝑄(𝑔𝑔𝑎𝑎(𝑥𝑥; 𝜙𝜙𝑔𝑔)))]
𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
− log2 𝑝𝑝�
𝑦𝑦 �
𝑦𝑦 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
[− log2 𝑝𝑝�
𝑦𝑦(𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 + 𝑛𝑛)]
Rate Loss
Rate Loss
 Applying Uniform Noise Approximation
𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
[− log2 𝑝𝑝�
𝑦𝑦(𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 + 𝑛𝑛)]
𝑝𝑝�
𝑦𝑦
𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
[− log2 𝑝𝑝�
𝑦𝑦(�
𝑦𝑦)]
Entropy Model
Non-parametric Density Model
Non-parametric Density Model
 Defining the density as a function (Neural Network)
Non-parametric Density Model
 Using Sigmoid at the last layer
All Jacobian elements are positive
Non-parametric Density Model
 Intermediate Activation Functions
All Jacobian elements are positive
𝑎𝑎 > 0 −1 < 𝑎𝑎 < 0 𝑎𝑎 < −1
Non-parametric Density Model
 Setting parameter constraints
All Jacobian elements are positive
Non-parametric Density Model
 Experiment on toy example
Non-parametric Density Model
 Revisiting Loss Function
𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥
[− log2 𝑝𝑝�
𝑦𝑦(�
𝑦𝑦)]
𝐷𝐷 = 𝑥𝑥 − �
𝑥𝑥 2
2
𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆
Variational Auto Encoder
Variational Auto Encoder
 Auto Encoder
�
𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛
𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔
�
𝑥𝑥 = 𝑔𝑔𝑠𝑠(�
𝑦𝑦; 𝜃𝜃𝑔𝑔)
Variational Auto Encoder
 Variational Auto Encoder
�
𝑦𝑦~𝑝𝑝�
𝑦𝑦|𝑥𝑥(�
𝑦𝑦|𝑥𝑥)
𝑥𝑥~𝑝𝑝𝑥𝑥|𝑦𝑦(𝑥𝑥|�
𝑦𝑦)
Variational Auto Encoder
 Defining Generative Model
�
𝑦𝑦~𝑝𝑝�
𝑦𝑦|𝑥𝑥(�
𝑦𝑦|𝑥𝑥)
Variational Auto Encoder
 Setting the parametric version of posterior
Variational Auto Encoder
 Setting KL Divergence
Scale Hyperprior
Scale Hyperprior
 Factorized Prior
�
𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛
𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔
�
𝑥𝑥 = 𝑔𝑔𝑠𝑠(�
𝑦𝑦; 𝜃𝜃𝑔𝑔)
𝑅𝑅 = − log2 𝑝𝑝�
𝑦𝑦(�
𝑦𝑦)
𝐷𝐷 = 𝑥𝑥 − �
𝑥𝑥 2
2
𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆
Inference Model
Generative Model
Entropy Model
𝐸𝐸𝐸𝐸𝐸𝐸[�
𝑦𝑦]
𝐷𝐷𝐷𝐷𝐷𝐷[�
𝑦𝑦]
Scale Hyperprior
 Scale of Y
Encoding Scale as ‘Additional Information’
Scale Hyperprior
 Scale Hyperprior
Scale Hyperprior
 Factorized Prior
�
𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛
𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔
�
𝑥𝑥 = 𝑔𝑔𝑠𝑠(�
𝑦𝑦; 𝜃𝜃𝑔𝑔)
𝑅𝑅 = − log2 𝑝𝑝�
𝑦𝑦 �
𝑦𝑦 − log2 𝑝𝑝�
𝑦𝑦|�
𝑧𝑧(�
𝑦𝑦| ̃
𝑧𝑧)
𝐷𝐷 = 𝑥𝑥 − �
𝑥𝑥 2
2
𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆
Inference Model
Generative Model
Entropy Model
𝑧𝑧 = ℎ𝑎𝑎 𝑦𝑦; 𝜙𝜙ℎ
̃
𝑧𝑧 = 𝑧𝑧 + 𝑛𝑛
�
𝜎𝜎 = ℎ𝑠𝑠 ̃
𝑧𝑧; 𝜃𝜃ℎ 𝐸𝐸𝐸𝐸𝐸𝐸[�
𝑦𝑦; �
𝜎𝜎]
𝐷𝐷𝐷𝐷𝐷𝐷[ ̃
𝑧𝑧] �
𝜎𝜎 = ℎ𝑠𝑠 ̃
𝑧𝑧; 𝜃𝜃ℎ
𝐸𝐸𝐸𝐸𝐸𝐸[ ̃
𝑧𝑧]
𝐷𝐷𝐷𝐷𝐷𝐷[�
𝑦𝑦; �
𝜎𝜎]
Scale Hyperprior
 Network Architecture
Scale Hyperprior
 Experiments
Scale Hyperprior
 Experiments
Thank You!

More Related Content

PDF
RoFormer: Enhanced Transformer with Rotary Position Embedding
taeseon ryu
 
PDF
[PR12] understanding deep learning requires rethinking generalization
JaeJun Yoo
 
PDF
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
jaypi Ko
 
PDF
diffusion 모델부터 DALLE2까지.pdf
수철 박
 
PDF
The Forward-Forward Algorithm
taeseon ryu
 
PDF
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
PPTX
알기쉬운 Variational autoencoder
홍배 김
 
PDF
Overview on Optimization algorithms in Deep Learning
Khang Pham
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
taeseon ryu
 
[PR12] understanding deep learning requires rethinking generalization
JaeJun Yoo
 
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
jaypi Ko
 
diffusion 모델부터 DALLE2까지.pdf
수철 박
 
The Forward-Forward Algorithm
taeseon ryu
 
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
알기쉬운 Variational autoencoder
홍배 김
 
Overview on Optimization algorithms in Deep Learning
Khang Pham
 

What's hot (20)

PDF
PR-328: End-to-End Optimized Image Compression
Hyeongmin Lee
 
PPTX
U-Net (1).pptx
Changjin Lee
 
PDF
PR-409: Denoising Diffusion Probabilistic Models
Hyeongmin Lee
 
PPTX
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
PPTX
Multimedia basic video compression techniques
Mazin Alwaaly
 
PPTX
U-Netpresentation.pptx
NoorUlHaq47
 
PDF
Energy based models and boltzmann machines - v2.0
Soowan Lee
 
PDF
U-Net: Convolutional Networks for Biomedical Image Segmentation
fake can
 
PPTX
The Rendering Pipeline - Challenges & Next Steps
repii
 
PPTX
Semantic Segmentation Methods using Deep Learning
Sungjoon Choi
 
PPTX
Variational Autoencoder Tutorial
Hojin Yang
 
PDF
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
岳華 杜
 
PPTX
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
PDF
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
PPTX
Object recognition of CIFAR - 10
Ratul Alahy
 
PDF
Introduction to Diffusion Models
Sangwoo Mo
 
PPT
Michal Erel's SIFT presentation
wolf
 
PDF
Pixel Recurrent Neural Networks
neouyghur
 
PDF
A Beginner's Guide to Monocular Depth Estimation
Ryo Takahashi
 
PR-328: End-to-End Optimized Image Compression
Hyeongmin Lee
 
U-Net (1).pptx
Changjin Lee
 
PR-409: Denoising Diffusion Probabilistic Models
Hyeongmin Lee
 
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
Multimedia basic video compression techniques
Mazin Alwaaly
 
U-Netpresentation.pptx
NoorUlHaq47
 
Energy based models and boltzmann machines - v2.0
Soowan Lee
 
U-Net: Convolutional Networks for Biomedical Image Segmentation
fake can
 
The Rendering Pipeline - Challenges & Next Steps
repii
 
Semantic Segmentation Methods using Deep Learning
Sungjoon Choi
 
Variational Autoencoder Tutorial
Hojin Yang
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
岳華 杜
 
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
Object recognition of CIFAR - 10
Ratul Alahy
 
Introduction to Diffusion Models
Sangwoo Mo
 
Michal Erel's SIFT presentation
wolf
 
Pixel Recurrent Neural Networks
neouyghur
 
A Beginner's Guide to Monocular Depth Estimation
Ryo Takahashi
 
Ad

Similar to PR-395: Variational Image Compression with a Scale Hyperprior (20)

PDF
PR-420: Scalable Model Compression by Entropy Penalized Reparameterization
Hyeongmin Lee
 
PDF
Journal Club: VQ-VAE2
Takuya Koumura
 
PDF
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
PDF
Autoencoders
CloudxLab
 
PPTX
Deep learning study 2
San Kim
 
PDF
Introduction to Autoencoders
Yan Xu
 
PDF
CBIR by deep learning
Vigen Sahakyan
 
PDF
Ai &amp; ml
Avilay Parekh
 
DOCX
Deep learning experiment 4 and 5 lab.docx
hadiya51
 
PPTX
Estado del Arte de la IA
Plain Concepts
 
PPTX
A Comprehensive Overview of Encoder and Decoder Architectures in Deep Learnin...
ShubhamMittal569818
 
PPTX
Autoencoders in Computer Vision: A Deep Learning Approach for Image Denoising...
ShubhamMittal569818
 
PDF
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly
 
PDF
Deep Style: Using Variational Auto-encoders for Image Generation
TJ Torres
 
PPTX
Autoecoders.pptx
MirzaJahanzeb5
 
PPTX
Deep Learning con CNTK by Pablo Doval
Plain Concepts
 
PDF
Foundations: Artificial Neural Networks
ananth
 
PDF
러닝머신 말고 머신러닝
Ji Gwang Kim
 
PPTX
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Daniel Lewis
 
PPTX
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
WrangleConf
 
PR-420: Scalable Model Compression by Entropy Penalized Reparameterization
Hyeongmin Lee
 
Journal Club: VQ-VAE2
Takuya Koumura
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Universitat Politècnica de Catalunya
 
Autoencoders
CloudxLab
 
Deep learning study 2
San Kim
 
Introduction to Autoencoders
Yan Xu
 
CBIR by deep learning
Vigen Sahakyan
 
Ai &amp; ml
Avilay Parekh
 
Deep learning experiment 4 and 5 lab.docx
hadiya51
 
Estado del Arte de la IA
Plain Concepts
 
A Comprehensive Overview of Encoder and Decoder Architectures in Deep Learnin...
ShubhamMittal569818
 
Autoencoders in Computer Vision: A Deep Learning Approach for Image Denoising...
ShubhamMittal569818
 
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly
 
Deep Style: Using Variational Auto-encoders for Image Generation
TJ Torres
 
Autoecoders.pptx
MirzaJahanzeb5
 
Deep Learning con CNTK by Pablo Doval
Plain Concepts
 
Foundations: Artificial Neural Networks
ananth
 
러닝머신 말고 머신러닝
Ji Gwang Kim
 
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Daniel Lewis
 
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
WrangleConf
 
Ad

More from Hyeongmin Lee (20)

PDF
PR-455: CoTracker: It is Better to Track Together
Hyeongmin Lee
 
PDF
PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...
Hyeongmin Lee
 
PDF
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
Hyeongmin Lee
 
PDF
PR-376: Softmax Splatting for Video Frame Interpolation
Hyeongmin Lee
 
PDF
PR-365: Fast object detection in compressed video
Hyeongmin Lee
 
PDF
PR-340: DVC: An End-to-end Deep Video Compression Framework
Hyeongmin Lee
 
PDF
PR-315: Taming Transformers for High-Resolution Image Synthesis
Hyeongmin Lee
 
PDF
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Hyeongmin Lee
 
PDF
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Hyeongmin Lee
 
PDF
Pr266
Hyeongmin Lee
 
PDF
PR-252: Making Convolutional Networks Shift-Invariant Again
Hyeongmin Lee
 
PDF
PR-240: Modulating Image Restoration with Continual Levels via Adaptive Featu...
Hyeongmin Lee
 
PDF
PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...
Hyeongmin Lee
 
PDF
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
Hyeongmin Lee
 
PDF
[PR12] Making Convolutional Networks Shift-Invariant Again
Hyeongmin Lee
 
PPTX
Latest Frame interpolation Algorithms
Hyeongmin Lee
 
PPTX
[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping
Hyeongmin Lee
 
PPTX
[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...
Hyeongmin Lee
 
PPTX
[Paper Review] Video Frame Interpolation via Adaptive Convolution
Hyeongmin Lee
 
PDF
[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients
Hyeongmin Lee
 
PR-455: CoTracker: It is Better to Track Together
Hyeongmin Lee
 
PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...
Hyeongmin Lee
 
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
Hyeongmin Lee
 
PR-376: Softmax Splatting for Video Frame Interpolation
Hyeongmin Lee
 
PR-365: Fast object detection in compressed video
Hyeongmin Lee
 
PR-340: DVC: An End-to-end Deep Video Compression Framework
Hyeongmin Lee
 
PR-315: Taming Transformers for High-Resolution Image Synthesis
Hyeongmin Lee
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Hyeongmin Lee
 
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
Hyeongmin Lee
 
PR-252: Making Convolutional Networks Shift-Invariant Again
Hyeongmin Lee
 
PR-240: Modulating Image Restoration with Continual Levels via Adaptive Featu...
Hyeongmin Lee
 
PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...
Hyeongmin Lee
 
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
Hyeongmin Lee
 
[PR12] Making Convolutional Networks Shift-Invariant Again
Hyeongmin Lee
 
Latest Frame interpolation Algorithms
Hyeongmin Lee
 
[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping
Hyeongmin Lee
 
[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...
Hyeongmin Lee
 
[Paper Review] Video Frame Interpolation via Adaptive Convolution
Hyeongmin Lee
 
[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients
Hyeongmin Lee
 

Recently uploaded (20)

PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PDF
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PDF
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
DOCX
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
PPTX
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PDF
flutter Launcher Icons, Splash Screens & Fonts
Ahmed Mohamed
 
PPT
Ppt for engineering students application on field effect
lakshmi.ec
 
PPT
SCOPE_~1- technology of green house and poyhouse
bala464780
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
Inventory management chapter in automation and robotics.
atisht0104
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
Zero Carbon Building Performance standard
BassemOsman1
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
flutter Launcher Icons, Splash Screens & Fonts
Ahmed Mohamed
 
Ppt for engineering students application on field effect
lakshmi.ec
 
SCOPE_~1- technology of green house and poyhouse
bala464780
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 

PR-395: Variational Image Compression with a Scale Hyperprior

  • 1. Variational Image Compression with a Scale Hyperprior Hyeongmin Lee PR-395 2022.7.17
  • 3. Entropy Coding  Image Compression 0011010100111...
  • 4. Entropy Coding  Entropy Coding A,B,C,D: 4 Letters (need at least 2 bits per letter) Sample Bits A 00 B 10 C 01 D 11 100100001111110101000100010001 (30 bits) https://www.programiz.com/dsa/huffman-coding
  • 5. Entropy Coding  Entropy Coding A,B,C,D: 4 Letters (need at least 2 bits per letter) Sample Bits A 11 B 100 C 0 D 101 1000111110110110100110110110 (28 bits) Lower Bound?? Huffman Coding
  • 6. Entropy Coding  Entropy 𝑯𝑯 = − � 𝒑𝒑𝒊𝒊𝒍𝒍𝒍𝒍𝒈𝒈𝟐𝟐𝒑𝒑𝒊𝒊 Sample 𝒑𝒑𝒊𝒊 A 5/15 B 1/15 C 6/15 D 3/15
  • 8. Image Compression  Image Compression 0011010100111... Lossless Coding: bmp  Coding the Integers Itself Lossy Coding: JPEG  Reducing the ‘Entropy’ of samples
  • 9. Image Compression  JPEG 8x8 cutting Discrete Cosine Transform Quantization Entropy Coding... https://www.whydomath.org/node/wavlets/basicjpg.html
  • 10. Image Compression  JPEG High Quality High Entropy High Bitrate Low Quality Low Entropy Low Bitrate
  • 13. Deep Learning-based Image Compression  Image Compression Pipeline Transform Quantization Entropy Coding Decoding Inverse Transform
  • 14. Deep Learning-based Image Compression  Image Compression Pipeline 𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆
  • 15. Deep Learning-based Image Compression  Auto Encoder � 𝑦𝑦 = 𝑄𝑄(𝑦𝑦) 𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 � 𝑥𝑥 = 𝑔𝑔𝑠𝑠(� 𝑦𝑦; 𝜃𝜃𝑔𝑔) 𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 [− log2 𝑝𝑝� 𝑦𝑦(� 𝑦𝑦)] 𝐷𝐷 = 𝑥𝑥 − � 𝑥𝑥 2 2 ???
  • 17. Problem of Quantization  Adding Uniform Noise [PR328] � 𝑦𝑦 = 𝑄𝑄 𝑦𝑦 � 𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛, 𝑛𝑛~𝑈𝑈(− 1 2 , 1 2 ) Ballé, Johannes, Valero Laparra, and Eero P. Simoncelli. "End-to-end optimized image compression." 5th International Conference on Learning Representations, ICLR 2017. 𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 − log2 𝑝𝑝� 𝑦𝑦 � 𝑦𝑦 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 [− log2 𝑝𝑝� 𝑦𝑦(𝑄𝑄(𝑔𝑔𝑎𝑎(𝑥𝑥; 𝜙𝜙𝑔𝑔)))] 𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 − log2 𝑝𝑝� 𝑦𝑦 � 𝑦𝑦 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 [− log2 𝑝𝑝� 𝑦𝑦(𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 + 𝑛𝑛)]
  • 19. Rate Loss  Applying Uniform Noise Approximation 𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 [− log2 𝑝𝑝� 𝑦𝑦(𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 + 𝑛𝑛)] 𝑝𝑝� 𝑦𝑦 𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 [− log2 𝑝𝑝� 𝑦𝑦(� 𝑦𝑦)] Entropy Model
  • 21. Non-parametric Density Model  Defining the density as a function (Neural Network)
  • 22. Non-parametric Density Model  Using Sigmoid at the last layer All Jacobian elements are positive
  • 23. Non-parametric Density Model  Intermediate Activation Functions All Jacobian elements are positive 𝑎𝑎 > 0 −1 < 𝑎𝑎 < 0 𝑎𝑎 < −1
  • 24. Non-parametric Density Model  Setting parameter constraints All Jacobian elements are positive
  • 25. Non-parametric Density Model  Experiment on toy example
  • 26. Non-parametric Density Model  Revisiting Loss Function 𝑅𝑅 = 𝐸𝐸𝑥𝑥~𝑝𝑝𝑥𝑥 [− log2 𝑝𝑝� 𝑦𝑦(� 𝑦𝑦)] 𝐷𝐷 = 𝑥𝑥 − � 𝑥𝑥 2 2 𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆
  • 28. Variational Auto Encoder  Auto Encoder � 𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛 𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 � 𝑥𝑥 = 𝑔𝑔𝑠𝑠(� 𝑦𝑦; 𝜃𝜃𝑔𝑔)
  • 29. Variational Auto Encoder  Variational Auto Encoder � 𝑦𝑦~𝑝𝑝� 𝑦𝑦|𝑥𝑥(� 𝑦𝑦|𝑥𝑥) 𝑥𝑥~𝑝𝑝𝑥𝑥|𝑦𝑦(𝑥𝑥|� 𝑦𝑦)
  • 30. Variational Auto Encoder  Defining Generative Model � 𝑦𝑦~𝑝𝑝� 𝑦𝑦|𝑥𝑥(� 𝑦𝑦|𝑥𝑥)
  • 31. Variational Auto Encoder  Setting the parametric version of posterior
  • 32. Variational Auto Encoder  Setting KL Divergence
  • 34. Scale Hyperprior  Factorized Prior � 𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛 𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 � 𝑥𝑥 = 𝑔𝑔𝑠𝑠(� 𝑦𝑦; 𝜃𝜃𝑔𝑔) 𝑅𝑅 = − log2 𝑝𝑝� 𝑦𝑦(� 𝑦𝑦) 𝐷𝐷 = 𝑥𝑥 − � 𝑥𝑥 2 2 𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆 Inference Model Generative Model Entropy Model 𝐸𝐸𝐸𝐸𝐸𝐸[� 𝑦𝑦] 𝐷𝐷𝐷𝐷𝐷𝐷[� 𝑦𝑦]
  • 35. Scale Hyperprior  Scale of Y Encoding Scale as ‘Additional Information’
  • 37. Scale Hyperprior  Factorized Prior � 𝑦𝑦 = 𝑦𝑦 + 𝑛𝑛 𝑦𝑦 = 𝑔𝑔𝑎𝑎 𝑥𝑥; 𝜙𝜙𝑔𝑔 � 𝑥𝑥 = 𝑔𝑔𝑠𝑠(� 𝑦𝑦; 𝜃𝜃𝑔𝑔) 𝑅𝑅 = − log2 𝑝𝑝� 𝑦𝑦 � 𝑦𝑦 − log2 𝑝𝑝� 𝑦𝑦|� 𝑧𝑧(� 𝑦𝑦| ̃ 𝑧𝑧) 𝐷𝐷 = 𝑥𝑥 − � 𝑥𝑥 2 2 𝐿𝐿 = 𝑅𝑅 + 𝜆𝜆𝜆𝜆 Inference Model Generative Model Entropy Model 𝑧𝑧 = ℎ𝑎𝑎 𝑦𝑦; 𝜙𝜙ℎ ̃ 𝑧𝑧 = 𝑧𝑧 + 𝑛𝑛 � 𝜎𝜎 = ℎ𝑠𝑠 ̃ 𝑧𝑧; 𝜃𝜃ℎ 𝐸𝐸𝐸𝐸𝐸𝐸[� 𝑦𝑦; � 𝜎𝜎] 𝐷𝐷𝐷𝐷𝐷𝐷[ ̃ 𝑧𝑧] � 𝜎𝜎 = ℎ𝑠𝑠 ̃ 𝑧𝑧; 𝜃𝜃ℎ 𝐸𝐸𝐸𝐸𝐸𝐸[ ̃ 𝑧𝑧] 𝐷𝐷𝐷𝐷𝐷𝐷[� 𝑦𝑦; � 𝜎𝜎]