SlideShare a Scribd company logo
InfoGAN: Interpretable Representation Learning by
Information Maximizing Generative Adversarial Nets
Xi Chen, Yan Duan, Rein Houthooft, John Schulman,
Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI)
Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo)
Unsupervised learning of disentangled representations
Goal
GANs + Maximizing Mutual Information
between generated images and input codes
Approach
Benefit
Interpretable representation obtained
without supervision and substantial additional costs
Reference
https://arxiv.org/abs/1606.03657 (with Appendix sections)
Implementations
https://github.com/openai/InfoGAN (by the authors, with TensorFlow)
https://github.com/yoshum/InfoGAN (by the presenter, with Chainer)
NIPS2016読み会
Motivation
How can we achieve
unsupervised learning of disentangled representation?
In general, learned representation is entangled,
i.e. encoded in a data space in a complicated manner
When a representation is disentangled, it would be
more interpretable and easier to apply to tasks
Related works
• Unsupervised learning of representation
(no mechanism to force disentanglement)
Stacked (often denoising) autoencoder, RBM
Many others, including semi-supervised approach
• Supervised learning of disentangled representation
Bilinear models, multi-view perceptron
VAEs, adversarial autoencoders
• Weakly supervised learning of disentangled representation
disBM, DC-IGN
• Unsupervised learning of disentangled representation
hossRBM, applicable only to discrete latent factors
which the presenter has almost no knowledge about.
This work:
Unsupervised learning of disentangled representation
applicable to both continuous and discrete latent factors
Generative Adversarial Nets(GANs)
Generative model trained by competition between
two neural nets:
Generator 𝑥 = 𝐺 𝑧 , 𝑧 ∼ 𝑝 𝑧 𝑍
𝑝 𝑧 𝑍 : an arbitrary noise distribution
Discriminator 𝐷 𝑥 ∈ 0,1 :
probability that 𝑥 is sampled from the data dist. 𝑝data(𝑋)
rather than generated by the generator 𝐺 𝑧
min
𝐺
max
𝐷
𝑉GAN 𝐺, 𝐷 , where
𝑉GAN 𝐺, 𝐷 ≡ 𝐸 𝑥∼𝑝data 𝑋 ln 𝐷 𝑥 + 𝐸 𝑧∼𝑝 𝑧 𝑍 ln 1 − 𝐷 𝐺 𝑧
Optimization problem to solve:
Problems with GANs
From the perspective of representation learning:
No restrictions on how 𝐺 𝑧 uses 𝑧
• 𝑧 can be used in a highly entangled way
• Each dimension of 𝑧 does not represent
any salient feature of the training data
𝑧1
𝑧2
Proposed Resolution: InfoGAN
-Maximizing Mutual Information -
Observation in conventional GANs:
a generated date 𝑥 does not have much information
on the noise 𝑧 from which 𝑥 is generated
because of heavily entangled use of 𝑧
Proposed resolution = InfoGAN:
the generator 𝐺 𝑧, 𝑐 trained so that
it maximize the mutual information 𝐼 𝐶 𝑋 between
the latent code 𝐶 and the generated data 𝑋
min
𝐺
max
𝐷
𝑉GAN 𝐺, 𝐷 − 𝜆𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
Mutual Information
𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 , where
• 𝐻 𝑋 = 𝐸 𝑥∼𝑝 𝑋 − ln 𝑝 𝑋 = 𝑥 :
Entropy of the prior distribution
• 𝐻 𝑋 𝑌 = 𝐸 𝑦∼𝑝 𝑌 ,𝑥∼𝑝 𝑋|𝑌=𝑦 − ln 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 :
Entropy of the posterior distribution
𝑝 𝑋 = 𝑥
𝑥
𝑝 𝑋 = 𝑥|𝑌 = 𝑦
𝑥
𝑝 𝑋 = 𝑥|𝑌 = 𝑦
𝑥
𝐼 𝑋; 𝑌 = 0 𝐼 𝑋; 𝑌 > 0
Sampling 𝑦 ∼ 𝑝 𝑌
Avoiding increase of calculation costs
Major difficulty:
Evaluation of 𝐼 𝐶 𝑋 based on
evaluation and sampling from the posterior 𝑝 𝐶 𝑋
Two strategies:
Variational maximization of mutual information
Use an approximate function 𝑄 𝑐 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
Sharing the neural net
between 𝑄 𝑐 𝑥 and the discriminator 𝐷 𝑥
Variational Maximization of MI
For an arbitrary function 𝑄 𝑐, 𝑥 ,
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
(∵ positivity of KL divergence)
= 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln
𝑝 𝐶 = 𝑐 𝑋 = 𝑥
𝑄 𝑐, 𝑥
= 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 𝐷KL 𝑝 𝐶 𝑋 = 𝑥 ||𝑄 𝐶, 𝑥
≥ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥)
Variational Maximization of MI
Maximizing 𝐿𝐼 𝐺, 𝑄 w.r.t. 𝐺 and 𝑄
 With 𝑄(𝑐, 𝑥) approximating 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 , we obtain
an variational estimate of the mutual information:
𝐿𝐼 𝐺, 𝑄 ≡ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 + 𝐻 𝐶
≲ 𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
⇔
• Achieving the equality by setting 𝑄 𝑐, 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
• Maximizing the mutual information
min
𝐺,𝑄
max
𝐷
𝑉GAN 𝐺, 𝐷 − 𝜆𝐿𝐼 𝐺, 𝑄
Optimization problem to solve in InfoGAN:
Eliminate sampling from posterior
Lemma
𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′
, 𝑦 .
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥
= 𝐸 𝑐∼𝑝 𝐶 ,𝑧∼𝑝 𝑧 𝑍 ,𝑥=𝐺 𝑧,𝑐 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 ,
By using this lemma and noting that
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝑬 𝒄∼𝒑 𝑪 ,𝒛∼𝒑 𝒛 𝒁 ,𝒙=𝑮 𝒛,𝒄 𝐥𝐧 𝑸 𝒄, 𝒙
we can eliminate the sampling from 𝑝 𝐶|𝑋 = 𝑥 :
Easy to estimate!
Proof of lemma
Lemma
𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′
, 𝑦 .
∵ l. h. s. =
𝑥 𝑦
𝑝 𝑋 = 𝑥 𝑝 𝑌 = 𝑦 𝑋 = 𝑥 𝑓 𝑥, 𝑦
=
𝑥 𝑦
𝑝 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
=
𝑥 𝑦 𝑥′
𝑝 𝑋 = 𝑥′, 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
=
𝑥 𝑦 𝑥′
𝑝 𝑋 = 𝑥′ 𝑝 𝑌 = 𝑦 𝑋 = 𝑥′ 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
= r. h. s.
∵ Bayes’ theorem
Sharing layers between 𝐷 and 𝑄
Model 𝑄 𝑐, 𝑥 using neural network
Reduce the calculation costs by
sharing all the convolution layers with 𝐷
Image from Odena, et al., arXiv:1610.09585.
Convolution layers of the discriminator
𝐷 𝑄
Given DCGANs,
InfoGAN comes for negligible additional costs!
Experiment – MI Maximization
• InfoGAN on MNIST dataset
• Latent code 𝑐
= 10-class categorical code
𝐿𝐼 quickly saturates to
𝐻 𝑐 = ln 10 ∼ 2.3 in InfoGAN
Figure 1 in the original paper
Experiment
– Disentangled Representation –
Figure 2 in the original paper
• InfoGAN on MNIST dataset
• Latent codes
 𝑐1: 10-class categorical code
 𝑐2, 𝑐3: continuous code
 𝑐1 can be used as a
classifier with 5% error
rate.
 𝑐2 and 𝑐3 captured the
rotation and width,
respectively
Experiment
– Disentangled Representation –
Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301.
Figure 3 in the original paper
Experiment
– Disentangled Representation –
Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769.
InfoGAN learned salient features without supervision
Figure 4 in the original paper
Experiment
– Disentangled Representation –
Dataset: Street View House Number
Figure 5 in the original paper
Experiment
– Disentangled Representation –
Dataset: CelebA
Figure 6 in the original paper
Future Prospect and Conclusion
Mutual information maximization can be applied to
other methods, e.g. VAE
Learning hierarchical latent representation
Improving semi-supervised learning
High-dimentional data discovery
Unsupervised learning of disentangled representations
Goal
GANs + Maximizing Mutual Information
between generated images and input codes
Approach
Benefit
Interpretable representation obtained
without supervision and substantial additional costs

More Related Content

What's hot (20)

PPTX
Introduction-to-Generative-AI.pptx
NikitaSingh741518
 
PDF
An Introduction to Optimal Transport
Gabriel Peyré
 
PDF
그림 그리는 AI
NAVER Engineering
 
PDF
Generative adversarial networks
남주 김
 
PDF
[DL輪読会]Disentangling by Factorising
Deep Learning JP
 
PPTX
Gradient descent method
Prof. Neeta Awasthy
 
PDF
Machine learning Lecture 3
Srinivasan R
 
PDF
Unsupervised Machine Learning Ml And How It Works
SlideTeam
 
PDF
Genetic Algorithms
Karthik Sankar
 
PPT
Data PreProcessing
tdharmaputhiran
 
PDF
Deep Convolutional GANs - meaning of latent space
Hansol Kang
 
PPTX
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
PPTX
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
PPTX
Dimensionality Reduction | Machine Learning | CloudxLab
CloudxLab
 
PPTX
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
PDF
Deep Generative Models
Chia-Wen Cheng
 
PPTX
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...
Haezoom Inc.
 
PDF
Overview of tree algorithms from decision tree to xgboost
Takami Sato
 
PDF
Score based generative model
sangyun lee
 
Introduction-to-Generative-AI.pptx
NikitaSingh741518
 
An Introduction to Optimal Transport
Gabriel Peyré
 
그림 그리는 AI
NAVER Engineering
 
Generative adversarial networks
남주 김
 
[DL輪読会]Disentangling by Factorising
Deep Learning JP
 
Gradient descent method
Prof. Neeta Awasthy
 
Machine learning Lecture 3
Srinivasan R
 
Unsupervised Machine Learning Ml And How It Works
SlideTeam
 
Genetic Algorithms
Karthik Sankar
 
Data PreProcessing
tdharmaputhiran
 
Deep Convolutional GANs - meaning of latent space
Hansol Kang
 
Disentangled Representation Learning of Deep Generative Models
Ryohei Suzuki
 
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
Dimensionality Reduction | Machine Learning | CloudxLab
CloudxLab
 
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
Deep Generative Models
Chia-Wen Cheng
 
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...
Haezoom Inc.
 
Overview of tree algorithms from decision tree to xgboost
Takami Sato
 
Score based generative model
sangyun lee
 

Viewers also liked (18)

PPTX
Introduction of "TrailBlazer" algorithm
Katsuki Ohto
 
PPT
時系列データ3
graySpace999
 
PDF
Value iteration networks
Fujimoto Keisuke
 
PDF
Fast and Probvably Seedings for k-Means
Kimikazu Kato
 
PDF
Dual Learning for Machine Translation (NIPS 2016)
Toru Fujino
 
PDF
Interaction Networks for Learning about Objects, Relations and Physics
Ken Kuroki
 
PDF
Safe and Efficient Off-Policy Reinforcement Learning
mooopan
 
PDF
Conditional Image Generation with PixelCNN Decoders
suga93
 
PDF
Learning to learn by gradient descent by gradient descent
Hiroyuki Fukuda
 
PPTX
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Kazuto Fukuchi
 
PDF
Improving Variational Inference with Inverse Autoregressive Flow
Tatsuya Shirakawa
 
PDF
[DL輪読会]Convolutional Sequence to Sequence Learning
Deep Learning JP
 
PDF
NIPS 2016 Overview and Deep Learning Topics
Koichi Hamada
 
PDF
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
Kusano Hitoshi
 
PPTX
Differential privacy without sensitivity [NIPS2016読み会資料]
Kentaro Minami
 
PDF
Matching networks for one shot learning
Kazuki Fujikawa
 
PPTX
ICML2016読み会 概要紹介
Kohei Hayashi
 
PDF
論文紹介 Pixel Recurrent Neural Networks
Seiya Tokui
 
Introduction of "TrailBlazer" algorithm
Katsuki Ohto
 
時系列データ3
graySpace999
 
Value iteration networks
Fujimoto Keisuke
 
Fast and Probvably Seedings for k-Means
Kimikazu Kato
 
Dual Learning for Machine Translation (NIPS 2016)
Toru Fujino
 
Interaction Networks for Learning about Objects, Relations and Physics
Ken Kuroki
 
Safe and Efficient Off-Policy Reinforcement Learning
mooopan
 
Conditional Image Generation with PixelCNN Decoders
suga93
 
Learning to learn by gradient descent by gradient descent
Hiroyuki Fukuda
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Kazuto Fukuchi
 
Improving Variational Inference with Inverse Autoregressive Flow
Tatsuya Shirakawa
 
[DL輪読会]Convolutional Sequence to Sequence Learning
Deep Learning JP
 
NIPS 2016 Overview and Deep Learning Topics
Koichi Hamada
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
Kusano Hitoshi
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Kentaro Minami
 
Matching networks for one shot learning
Kazuki Fujikawa
 
ICML2016読み会 概要紹介
Kohei Hayashi
 
論文紹介 Pixel Recurrent Neural Networks
Seiya Tokui
 
Ad

Similar to InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (20)

PPTX
Reviews on Deep Generative Models in the early days / GANs & VAEs paper review
changedaeoh
 
PDF
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
Yoonho Na
 
PDF
InfoGAN:Bridging the Gap Between Data and Understanding in GANs
University of Alabama
 
PDF
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
fahid32446
 
PDF
Introduction to Generative Adversarial Network
vaidehimadaan041
 
PDF
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
NAVER Engineering
 
PDF
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
준식 최
 
PDF
[PR12] intro. to gans jaejun yoo
JaeJun Yoo
 
PDF
Brief introduction on GAN
Dai-Hai Nguyen
 
PPTX
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
宏毅 李
 
PDF
Generative adversarial networks
Yunjey Choi
 
PDF
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Universitat Politècnica de Catalunya
 
PDF
A Walk in the GAN Zoo
Larry Guo
 
PPTX
GANs Deep Learning Summer School
Rubens Zimbres, PhD
 
PPTX
Synthetic Image Data Generation using GAN &Triple GAN.pptx
RupeshKumar301638
 
PPT
Deep-Learning-2017-Lecture7GAN.ppt
GayathriSanthosh11
 
PPT
Deep-Learning-2017-Lecture7GAN.ppt
someyamohsen2
 
PPT
Deep-Learning-2017-Lecture7GAN.ppt
RMDAcademicCoordinat
 
PPTX
GAN for Bayesian Inference objectives
Natan Katz
 
PDF
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Reviews on Deep Generative Models in the early days / GANs & VAEs paper review
changedaeoh
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
Yoonho Na
 
InfoGAN:Bridging the Gap Between Data and Understanding in GANs
University of Alabama
 
gans_copy.pdfhjsjsisidkskskkskwkduydjekedj
fahid32446
 
Introduction to Generative Adversarial Network
vaidehimadaan041
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
NAVER Engineering
 
Paper Summary of Infogan-CR : Disentangling Generative Adversarial Networks w...
준식 최
 
[PR12] intro. to gans jaejun yoo
JaeJun Yoo
 
Brief introduction on GAN
Dai-Hai Nguyen
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
宏毅 李
 
Generative adversarial networks
Yunjey Choi
 
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Universitat Politècnica de Catalunya
 
A Walk in the GAN Zoo
Larry Guo
 
GANs Deep Learning Summer School
Rubens Zimbres, PhD
 
Synthetic Image Data Generation using GAN &Triple GAN.pptx
RupeshKumar301638
 
Deep-Learning-2017-Lecture7GAN.ppt
GayathriSanthosh11
 
Deep-Learning-2017-Lecture7GAN.ppt
someyamohsen2
 
Deep-Learning-2017-Lecture7GAN.ppt
RMDAcademicCoordinat
 
GAN for Bayesian Inference objectives
Natan Katz
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Ad

Recently uploaded (20)

PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
July Patch Tuesday
Ivanti
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
July Patch Tuesday
Ivanti
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Python basic programing language for automation
DanialHabibi2
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

  • 1. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI) Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo) Unsupervised learning of disentangled representations Goal GANs + Maximizing Mutual Information between generated images and input codes Approach Benefit Interpretable representation obtained without supervision and substantial additional costs Reference https://arxiv.org/abs/1606.03657 (with Appendix sections) Implementations https://github.com/openai/InfoGAN (by the authors, with TensorFlow) https://github.com/yoshum/InfoGAN (by the presenter, with Chainer) NIPS2016読み会
  • 2. Motivation How can we achieve unsupervised learning of disentangled representation? In general, learned representation is entangled, i.e. encoded in a data space in a complicated manner When a representation is disentangled, it would be more interpretable and easier to apply to tasks
  • 3. Related works • Unsupervised learning of representation (no mechanism to force disentanglement) Stacked (often denoising) autoencoder, RBM Many others, including semi-supervised approach • Supervised learning of disentangled representation Bilinear models, multi-view perceptron VAEs, adversarial autoencoders • Weakly supervised learning of disentangled representation disBM, DC-IGN • Unsupervised learning of disentangled representation hossRBM, applicable only to discrete latent factors which the presenter has almost no knowledge about. This work: Unsupervised learning of disentangled representation applicable to both continuous and discrete latent factors
  • 4. Generative Adversarial Nets(GANs) Generative model trained by competition between two neural nets: Generator 𝑥 = 𝐺 𝑧 , 𝑧 ∼ 𝑝 𝑧 𝑍 𝑝 𝑧 𝑍 : an arbitrary noise distribution Discriminator 𝐷 𝑥 ∈ 0,1 : probability that 𝑥 is sampled from the data dist. 𝑝data(𝑋) rather than generated by the generator 𝐺 𝑧 min 𝐺 max 𝐷 𝑉GAN 𝐺, 𝐷 , where 𝑉GAN 𝐺, 𝐷 ≡ 𝐸 𝑥∼𝑝data 𝑋 ln 𝐷 𝑥 + 𝐸 𝑧∼𝑝 𝑧 𝑍 ln 1 − 𝐷 𝐺 𝑧 Optimization problem to solve:
  • 5. Problems with GANs From the perspective of representation learning: No restrictions on how 𝐺 𝑧 uses 𝑧 • 𝑧 can be used in a highly entangled way • Each dimension of 𝑧 does not represent any salient feature of the training data 𝑧1 𝑧2
  • 6. Proposed Resolution: InfoGAN -Maximizing Mutual Information - Observation in conventional GANs: a generated date 𝑥 does not have much information on the noise 𝑧 from which 𝑥 is generated because of heavily entangled use of 𝑧 Proposed resolution = InfoGAN: the generator 𝐺 𝑧, 𝑐 trained so that it maximize the mutual information 𝐼 𝐶 𝑋 between the latent code 𝐶 and the generated data 𝑋 min 𝐺 max 𝐷 𝑉GAN 𝐺, 𝐷 − 𝜆𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
  • 7. Mutual Information 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 , where • 𝐻 𝑋 = 𝐸 𝑥∼𝑝 𝑋 − ln 𝑝 𝑋 = 𝑥 : Entropy of the prior distribution • 𝐻 𝑋 𝑌 = 𝐸 𝑦∼𝑝 𝑌 ,𝑥∼𝑝 𝑋|𝑌=𝑦 − ln 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 : Entropy of the posterior distribution 𝑝 𝑋 = 𝑥 𝑥 𝑝 𝑋 = 𝑥|𝑌 = 𝑦 𝑥 𝑝 𝑋 = 𝑥|𝑌 = 𝑦 𝑥 𝐼 𝑋; 𝑌 = 0 𝐼 𝑋; 𝑌 > 0 Sampling 𝑦 ∼ 𝑝 𝑌
  • 8. Avoiding increase of calculation costs Major difficulty: Evaluation of 𝐼 𝐶 𝑋 based on evaluation and sampling from the posterior 𝑝 𝐶 𝑋 Two strategies: Variational maximization of mutual information Use an approximate function 𝑄 𝑐 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 Sharing the neural net between 𝑄 𝑐 𝑥 and the discriminator 𝐷 𝑥
  • 9. Variational Maximization of MI For an arbitrary function 𝑄 𝑐, 𝑥 , 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 (∵ positivity of KL divergence) = 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 𝑄 𝑐, 𝑥 = 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 𝐷KL 𝑝 𝐶 𝑋 = 𝑥 ||𝑄 𝐶, 𝑥 ≥ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥)
  • 10. Variational Maximization of MI Maximizing 𝐿𝐼 𝐺, 𝑄 w.r.t. 𝐺 and 𝑄  With 𝑄(𝑐, 𝑥) approximating 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 , we obtain an variational estimate of the mutual information: 𝐿𝐼 𝐺, 𝑄 ≡ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 + 𝐻 𝐶 ≲ 𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶 ⇔ • Achieving the equality by setting 𝑄 𝑐, 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 • Maximizing the mutual information min 𝐺,𝑄 max 𝐷 𝑉GAN 𝐺, 𝐷 − 𝜆𝐿𝐼 𝐺, 𝑄 Optimization problem to solve in InfoGAN:
  • 11. Eliminate sampling from posterior Lemma 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′ , 𝑦 . 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝐸 𝑐∼𝑝 𝐶 ,𝑧∼𝑝 𝑧 𝑍 ,𝑥=𝐺 𝑧,𝑐 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 , By using this lemma and noting that 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝑬 𝒄∼𝒑 𝑪 ,𝒛∼𝒑 𝒛 𝒁 ,𝒙=𝑮 𝒛,𝒄 𝐥𝐧 𝑸 𝒄, 𝒙 we can eliminate the sampling from 𝑝 𝐶|𝑋 = 𝑥 : Easy to estimate!
  • 12. Proof of lemma Lemma 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′ , 𝑦 . ∵ l. h. s. = 𝑥 𝑦 𝑝 𝑋 = 𝑥 𝑝 𝑌 = 𝑦 𝑋 = 𝑥 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑝 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑥′ 𝑝 𝑋 = 𝑥′, 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑥′ 𝑝 𝑋 = 𝑥′ 𝑝 𝑌 = 𝑦 𝑋 = 𝑥′ 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = r. h. s. ∵ Bayes’ theorem
  • 13. Sharing layers between 𝐷 and 𝑄 Model 𝑄 𝑐, 𝑥 using neural network Reduce the calculation costs by sharing all the convolution layers with 𝐷 Image from Odena, et al., arXiv:1610.09585. Convolution layers of the discriminator 𝐷 𝑄 Given DCGANs, InfoGAN comes for negligible additional costs!
  • 14. Experiment – MI Maximization • InfoGAN on MNIST dataset • Latent code 𝑐 = 10-class categorical code 𝐿𝐼 quickly saturates to 𝐻 𝑐 = ln 10 ∼ 2.3 in InfoGAN Figure 1 in the original paper
  • 15. Experiment – Disentangled Representation – Figure 2 in the original paper • InfoGAN on MNIST dataset • Latent codes  𝑐1: 10-class categorical code  𝑐2, 𝑐3: continuous code  𝑐1 can be used as a classifier with 5% error rate.  𝑐2 and 𝑐3 captured the rotation and width, respectively
  • 16. Experiment – Disentangled Representation – Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301. Figure 3 in the original paper
  • 17. Experiment – Disentangled Representation – Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769. InfoGAN learned salient features without supervision Figure 4 in the original paper
  • 18. Experiment – Disentangled Representation – Dataset: Street View House Number Figure 5 in the original paper
  • 19. Experiment – Disentangled Representation – Dataset: CelebA Figure 6 in the original paper
  • 20. Future Prospect and Conclusion Mutual information maximization can be applied to other methods, e.g. VAE Learning hierarchical latent representation Improving semi-supervised learning High-dimentional data discovery Unsupervised learning of disentangled representations Goal GANs + Maximizing Mutual Information between generated images and input codes Approach Benefit Interpretable representation obtained without supervision and substantial additional costs