A note on the density of Gumbel-softmax
Tomonari MASADA @ Nagasaki University
May 29, 2019
This note explicates some details of the discussion given in Appendix B of [1].
The Gumbel-softmax trick gives a k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1
whose entries
are obtained as
yi =
exp((log(πi) + gi)/τ)
k
j=1 exp((log(πj) + gj)/τ)
for i = 1, . . . , k, (1)
by using g1, . . . , gk, which are i.i.d samples drawn from Gumbel(0, 1).
Define xi = log(πi). Then y is rewritten as
yi =
exp((xi + gi)/τ)
k
j=1 exp((xj + gj)/τ)
for i = 1, . . . , k, (2)
Divide both numerator and denominator by exp((xk + gk)/τ).
yi =
exp((xi + gi − (xk + gk))/τ)
k
j=1 exp((xj + gj − (xk + gk))/τ)
for i = 1, . . . , k, (3)
Define ui = xi + gi − (xk + gk) for i = 1, . . . , k − 1, where gi ∼ Gumbel(0, 1). When gk is given,
gi = ui−xi+(xk+gk) and ui can thus be regarded as a sample from the Gumbel whose mean is xi−(xk+gk)
and scale parameter is 1. Therefore, p(ui|gk) = e−{(ui−xi+(xk+gk))+e−(ui−xi+(xk+gk))
}
. Consequently, the
density p(u1, . . . , uk−1) is given as follows:
p(u1, . . . , uk−1) =
∞
−∞
dgkp(u1, . . . , uk−1|gk)p(gk)
=
∞
−∞
dgkp(gk)
k−1
i=1
p(ui|gk)
=
∞
−∞
dgke−gk−e−gk
k−1
i=1
exi−ui−xk−gk−exi−ui−xk−gk
(4)
Perform a change of variables with v = e−gk
. Then dv
dgk
= −e−gk
. Therefore, dv = −e−gk
dgk and
dgk = −dvegk
= −dv/v.
p(u1, . . . , uk−1) =
0
∞
(−dv)e−v
k−1
i=1
vexi−ui−xk−vexi−ui−xk
=
k−1
i=1
exi−ui−xk
∞
0
dve−v
vk−1
k−1
i=1
e−vexi−ui−xk
= e−(k−1)xk
k−1
i=1
exi−ui
∞
0
dvvk−1
e−v(1+e−xk k−1
i=1 (exi−ui ))
(5)
Recall the following fact related to Gamma integral:
∞
0
xz−1
e−ax
dx =
∞
0
y
a
z−1
e−y dy
a
=
1
a
z ∞
0
yz−1
e−y
dy = a−z
Γ(z) (6)
1
Therefore,
p(u1, . . . , uk−1) = e−(k−1)xk
k−1
i=1
exi−ui
∞
0
dvvk−1
e−v(1+e−xk k−1
i=1 (exi−ui ))
= e−kxk
exk
k−1
i=1
exi−ui
1 + e−xk
k−1
i=1
(exi−ui
)
−k
Γ(k)
= exk
k−1
i=1
exi−ui
exk
+
k−1
i=1
(exi−ui
)
−k
Γ(k)
= exp xk +
k−1
i=1
(xi − ui) exk
+
k−1
i=1
(exi−ui
)
−k
Γ(k) (7)
Define uk = 0. Then
p(u1, . . . , uk−1) = Γ(k)
k
i=1
exp(xi − ui)
k
i=1
exp(xi − ui)
−k
(8)
A k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1
is obtained from u1, . . . , uk−1 by applying a
deterministic transformation h:
hi(u1, . . . , uk−1) =
exp(ui/τ)
1 +
k−1
j=1 exp(uj/τ)
for i = 1, . . . , k − 1 (9)
as follows:
yi = hi(u1, . . . , uk−1) for i = 1, . . . , k − 1 (10)
Note that yk is fixed given y1, . . . , yk−1:
yk =
1
1 +
k−1
j=1 exp(uj/τ)
= 1 −
k−1
j=1
yj (11)
By using the change of variables we can obtain the density function for y:
p(y) = p(h−1
(y1, . . . , yk−1)) det
∂(h−1
1 (y1, . . . , yk−1), . . . , h−1
k−1(y1, . . . , yk−1))
∂(y1, . . . , yk−1)
(12)
The inverse of h is obtained as follows:
yi =
exp(ui/τ)
1 +
k−1
j=1 exp(uj/τ)
yi = yk exp(ui/τ) from yk = 1
1+ k−1
j=1 exp(uj /τ)
log yi = log yk + ui/τ
∴ h−1
i (y1, . . . , yk−1) = ui = τ × (log yi − log yk) = τ × log yi − log 1 −
k−1
j=1
yj (13)
Therefore, we obtain the Jacobian:
∂h−1
i (y1, . . . , yk−1)
∂yi
= τ ×
1
yi
−
1
yk
∂yk
∂yi
= τ ×
1
yi
+
1
yk
(14)
∂h−1
i (y1, . . . , yk−1)
∂yj
= τ × −
1
yk
∂yk
∂yj
= τ ×
1
yk
for j = i (15)
Eqs. from (24) to (28) are easy to understand.
References
[1] E. Jang, S. Gu, and B. Poole. Categorical representation with Gumbel-softmax. ICLR, 2017.
2

More Related Content

PDF
Divisor de 7
PDF
SUPER MAGIC CORONATIONS OF GRAPHS
PDF
Bellman functions and Lp estimates for paraproducts
PDF
Paraproducts with general dilations
PDF
A NEW APPROACH TO M(G)-GROUP SOFT UNION ACTION AND ITS APPLICATIONS TO M(G)-G...
PDF
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
PPTX
Greedy algo revision 2
PDF
QMC: Transition Workshop - How to Efficiently Implement Multivariate Decompos...
Divisor de 7
SUPER MAGIC CORONATIONS OF GRAPHS
Bellman functions and Lp estimates for paraproducts
Paraproducts with general dilations
A NEW APPROACH TO M(G)-GROUP SOFT UNION ACTION AND ITS APPLICATIONS TO M(G)-G...
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
Greedy algo revision 2
QMC: Transition Workshop - How to Efficiently Implement Multivariate Decompos...

What's hot (20)

PDF
Quantitative norm convergence of some ergodic averages
PDF
Specific Finite Groups(General)
PDF
QMC Error SAMSI Tutorial Aug 2017
PDF
Specific Finite Groups(General)
PDF
Scattering theory analogues of several classical estimates in Fourier analysis
PDF
Estimates for a class of non-standard bilinear multipliers
PDF
A Szemerédi-type theorem for subsets of the unit cube
PDF
Trilinear embedding for divergence-form operators
PDF
The partial derivative of the binary Cross-entropy loss function
PDF
GradStudentSeminarSept30
PDF
On maximal and variational Fourier restriction
PDF
A sharp nonlinear Hausdorff-Young inequality for small potentials
PDF
Some Examples of Scaling Sets
PDF
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
PDF
Murphy: Machine learning A probabilistic perspective: Ch.9
PDF
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
PDF
MDL/Bayesian Criteria based on Universal Coding/Measure
DOCX
Genomic Graphics
Quantitative norm convergence of some ergodic averages
Specific Finite Groups(General)
QMC Error SAMSI Tutorial Aug 2017
Specific Finite Groups(General)
Scattering theory analogues of several classical estimates in Fourier analysis
Estimates for a class of non-standard bilinear multipliers
A Szemerédi-type theorem for subsets of the unit cube
Trilinear embedding for divergence-form operators
The partial derivative of the binary Cross-entropy loss function
GradStudentSeminarSept30
On maximal and variational Fourier restriction
A sharp nonlinear Hausdorff-Young inequality for small potentials
Some Examples of Scaling Sets
Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...
Murphy: Machine learning A probabilistic perspective: Ch.9
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
MDL/Bayesian Criteria based on Universal Coding/Measure
Genomic Graphics
Ad

Similar to A note on the density of Gumbel-softmax (20)

PDF
Otrzymywaniemagnetcgutdfghhitrrrrrrriohf
PDF
Fast parallelizable scenario-based stochastic optimization
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
fouriertransform.pdf
PPT
FUNCTION EX 1 PROBLEMS WITH SOLUTION UPTO JEE LEVEL
PDF
exponen dan logaritma
PDF
The Fundamental Solution of an Extension to a Generalized Laplace Equation
PDF
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
PDF
Answers to Problems for Adaptive Filters (2nd Edition) - Behrouz Farhang-Boro...
PDF
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
PDF
5.8 Permutations (handout)
PDF
Ece3075 a 8
PDF
Best Approximation in Real Linear 2-Normed Spaces
PDF
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
PPTX
Lesson 3 Operation on Functions
PPT
Composite Function Mathematics for grade 11
PDF
Dominación y extensiones óptimas de operadores con rango esencial compacto en...
PPT
11-FUNCTIONS-OPERATIONS-COMPOSITE-INVERSE1.ppt
PDF
Convexity in the Theory of the Gamma Function.pdf
PDF
Cubic Spline Interpolation
Otrzymywaniemagnetcgutdfghhitrrrrrrriohf
Fast parallelizable scenario-based stochastic optimization
Welcome to International Journal of Engineering Research and Development (IJERD)
fouriertransform.pdf
FUNCTION EX 1 PROBLEMS WITH SOLUTION UPTO JEE LEVEL
exponen dan logaritma
The Fundamental Solution of an Extension to a Generalized Laplace Equation
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
Answers to Problems for Adaptive Filters (2nd Edition) - Behrouz Farhang-Boro...
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
5.8 Permutations (handout)
Ece3075 a 8
Best Approximation in Real Linear 2-Normed Spaces
Wavelet Tour of Signal Processing 3rd Edition Mallat Solutions Manual
Lesson 3 Operation on Functions
Composite Function Mathematics for grade 11
Dominación y extensiones óptimas de operadores con rango esencial compacto en...
11-FUNCTIONS-OPERATIONS-COMPOSITE-INVERSE1.ppt
Convexity in the Theory of the Gamma Function.pdf
Cubic Spline Interpolation
Ad

More from Tomonari Masada (20)

PDF
Learning Latent Space Energy Based Prior Modelの解説
PDF
Denoising Diffusion Probabilistic Modelsの重要な式の解説
PDF
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
PPTX
トピックモデルの基礎と応用
PDF
Expectation propagation for latent Dirichlet allocation
PDF
Mini-batch Variational Inference for Time-Aware Topic Modeling
PDF
A note on variational inference for the univariate Gaussian
PDF
Document Modeling with Implicit Approximate Posterior Distributions
PDF
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
PDF
A Note on ZINB-VAE
PDF
A Note on Latent LSTM Allocation
PDF
A Note on TopicRNN
PDF
Topic modeling with Poisson factorization (2)
PDF
Poisson factorization
PPTX
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
PPTX
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
TXT
Word count in Husserliana Volumes 1 to 28
PDF
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
PDF
FDSE2015
PDF
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
Learning Latent Space Energy Based Prior Modelの解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
トピックモデルの基礎と応用
Expectation propagation for latent Dirichlet allocation
Mini-batch Variational Inference for Time-Aware Topic Modeling
A note on variational inference for the univariate Gaussian
Document Modeling with Implicit Approximate Posterior Distributions
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
A Note on ZINB-VAE
A Note on Latent LSTM Allocation
A Note on TopicRNN
Topic modeling with Poisson factorization (2)
Poisson factorization
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Word count in Husserliana Volumes 1 to 28
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
FDSE2015
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...

Recently uploaded (20)

PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PPTX
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
PPTX
Machine Learning and working of machine Learning
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PPTX
ifsm.pptx, institutional food service management
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PPTX
GPS sensor used agriculture land for automation
PPTX
transformers as a tool for understanding advance algorithms in deep learning
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PPTX
Stats annual compiled ipd opd ot br 2024
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PDF
Mcdonald's : a half century growth . pdf
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
Hushh Hackathon for IIT Bombay: Create your very own Agents
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PDF
General category merit rank list for neet pg
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PPTX
recommendation Project PPT with details attached
1 hour to get there before the game is done so you don’t need a car seat for ...
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
Machine Learning and working of machine Learning
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
ifsm.pptx, institutional food service management
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
GPS sensor used agriculture land for automation
transformers as a tool for understanding advance algorithms in deep learning
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
Stats annual compiled ipd opd ot br 2024
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
AI AND ML PROPOSAL PRESENTATION MUST.pptx
Mcdonald's : a half century growth . pdf
Session 11 - Data Visualization Storytelling (2).pdf
Hushh Hackathon for IIT Bombay: Create your very own Agents
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
General category merit rank list for neet pg
The Role of Pathology AI in Translational Cancer Research and Education
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
recommendation Project PPT with details attached

A note on the density of Gumbel-softmax

  • 1. A note on the density of Gumbel-softmax Tomonari MASADA @ Nagasaki University May 29, 2019 This note explicates some details of the discussion given in Appendix B of [1]. The Gumbel-softmax trick gives a k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1 whose entries are obtained as yi = exp((log(πi) + gi)/τ) k j=1 exp((log(πj) + gj)/τ) for i = 1, . . . , k, (1) by using g1, . . . , gk, which are i.i.d samples drawn from Gumbel(0, 1). Define xi = log(πi). Then y is rewritten as yi = exp((xi + gi)/τ) k j=1 exp((xj + gj)/τ) for i = 1, . . . , k, (2) Divide both numerator and denominator by exp((xk + gk)/τ). yi = exp((xi + gi − (xk + gk))/τ) k j=1 exp((xj + gj − (xk + gk))/τ) for i = 1, . . . , k, (3) Define ui = xi + gi − (xk + gk) for i = 1, . . . , k − 1, where gi ∼ Gumbel(0, 1). When gk is given, gi = ui−xi+(xk+gk) and ui can thus be regarded as a sample from the Gumbel whose mean is xi−(xk+gk) and scale parameter is 1. Therefore, p(ui|gk) = e−{(ui−xi+(xk+gk))+e−(ui−xi+(xk+gk)) } . Consequently, the density p(u1, . . . , uk−1) is given as follows: p(u1, . . . , uk−1) = ∞ −∞ dgkp(u1, . . . , uk−1|gk)p(gk) = ∞ −∞ dgkp(gk) k−1 i=1 p(ui|gk) = ∞ −∞ dgke−gk−e−gk k−1 i=1 exi−ui−xk−gk−exi−ui−xk−gk (4) Perform a change of variables with v = e−gk . Then dv dgk = −e−gk . Therefore, dv = −e−gk dgk and dgk = −dvegk = −dv/v. p(u1, . . . , uk−1) = 0 ∞ (−dv)e−v k−1 i=1 vexi−ui−xk−vexi−ui−xk = k−1 i=1 exi−ui−xk ∞ 0 dve−v vk−1 k−1 i=1 e−vexi−ui−xk = e−(k−1)xk k−1 i=1 exi−ui ∞ 0 dvvk−1 e−v(1+e−xk k−1 i=1 (exi−ui )) (5) Recall the following fact related to Gamma integral: ∞ 0 xz−1 e−ax dx = ∞ 0 y a z−1 e−y dy a = 1 a z ∞ 0 yz−1 e−y dy = a−z Γ(z) (6) 1
  • 2. Therefore, p(u1, . . . , uk−1) = e−(k−1)xk k−1 i=1 exi−ui ∞ 0 dvvk−1 e−v(1+e−xk k−1 i=1 (exi−ui )) = e−kxk exk k−1 i=1 exi−ui 1 + e−xk k−1 i=1 (exi−ui ) −k Γ(k) = exk k−1 i=1 exi−ui exk + k−1 i=1 (exi−ui ) −k Γ(k) = exp xk + k−1 i=1 (xi − ui) exk + k−1 i=1 (exi−ui ) −k Γ(k) (7) Define uk = 0. Then p(u1, . . . , uk−1) = Γ(k) k i=1 exp(xi − ui) k i=1 exp(xi − ui) −k (8) A k-dimensional sample vector y = (y1, . . . , yk) ∈ ∆k−1 is obtained from u1, . . . , uk−1 by applying a deterministic transformation h: hi(u1, . . . , uk−1) = exp(ui/τ) 1 + k−1 j=1 exp(uj/τ) for i = 1, . . . , k − 1 (9) as follows: yi = hi(u1, . . . , uk−1) for i = 1, . . . , k − 1 (10) Note that yk is fixed given y1, . . . , yk−1: yk = 1 1 + k−1 j=1 exp(uj/τ) = 1 − k−1 j=1 yj (11) By using the change of variables we can obtain the density function for y: p(y) = p(h−1 (y1, . . . , yk−1)) det ∂(h−1 1 (y1, . . . , yk−1), . . . , h−1 k−1(y1, . . . , yk−1)) ∂(y1, . . . , yk−1) (12) The inverse of h is obtained as follows: yi = exp(ui/τ) 1 + k−1 j=1 exp(uj/τ) yi = yk exp(ui/τ) from yk = 1 1+ k−1 j=1 exp(uj /τ) log yi = log yk + ui/τ ∴ h−1 i (y1, . . . , yk−1) = ui = τ × (log yi − log yk) = τ × log yi − log 1 − k−1 j=1 yj (13) Therefore, we obtain the Jacobian: ∂h−1 i (y1, . . . , yk−1) ∂yi = τ × 1 yi − 1 yk ∂yk ∂yi = τ × 1 yi + 1 yk (14) ∂h−1 i (y1, . . . , yk−1) ∂yj = τ × − 1 yk ∂yk ∂yj = τ × 1 yk for j = i (15) Eqs. from (24) to (28) are easy to understand. References [1] E. Jang, S. Gu, and B. Poole. Categorical representation with Gumbel-softmax. ICLR, 2017. 2