SlideShare a Scribd company logo
Perturbed (accelerated) Proximal-Gradient algorithms
Gersende Fort
CNRS & Institut de Math´ematiques de Toulouse
France
Works with Eric Moulines (Ecole Polytechnique, France); Yves Atchad´e (Univ.
Michigan, USA); J.F. Aujol (Univ. Bordeaux, France) and C. Dossal (INSA
Toulouse, France)
1 / 12
Interested in (1/3)
(arg)minθ∈Rp (f(θ) + g(θ))
with
g : Rp
→ [0, ∞] is convex, non smooth, not identically equal to +∞, and lsc.
Proxγg(τ) is explicit
f is smooth (gradient Lipschitz) with an untractable gradient
Algorithm: Perturbed Proximal-Gradient
θk+1 = Proxγk+1g θk − γk+1 f(θk)
Questions: Conditions on γk+1 and on f(θk) − f(θk) to ensure the same
limiting behavior as the Prox-Gdt algorithm ?
2 / 12
Interested in (2/3)
Furthermore, in the case
a) the gradient is an untractable expectation
f(θ) =
X
H(θ, x)
explicit
πθ(dx)
probability
b) Stochastic approximation to avoid curse of dimensionality
c) i.i.d. Monte Carlo not possible/efficient → Markov Chain MC (MCMC)
sampling
Questions: Since MCMC provides a biased approximation
f(θk) ≈
1
mk+1
mk+1
j=1
H(θ, Xjk) E

 1
mk+1
mk+1
j=1
H(θ, Xjk)

 − f(θk) = 0
where {X1k, · · · , Xjk, · · · } Markov chain with stationary distribution πθk
which conditions on γk+1 and on the Monte Carlo batch size mk+1 ?
is it possible to have a non vanishing bias i.e. mk+1 = m ?
3 / 12
Interested in (3/3)
Perturbed Prox-Gdt + Acceleration:
τk = θk +
tk−1 − 1
tk
(θk − θk−1)
θk+1 = Proxγk+1g θk − γk+1 f(τk)
Questions:
Which sequences γk, tk, among those satisfying
γk+1tk(tk − 1) ≤ γkt2
k−1
When stochastic approx of the gradient: which Monte Carlo batch size mk ?
Is there a gain to consider tk = O(kd
) for some 0 ≤ d ≤ 1 ?
4 / 12
Motivations for MCMC approx (1/3)
Computational Statistics, Statistical Learning
Online learning: here the “Monte Carlo points” are the
examples/observations.
Penalized Maximum Likelihood Estimation in a parametric model
argminθ f(θ)
negative log-likelihood
+ g(θ)
penalty term
5 / 12
Motivations for MCMC approx (2/3)
Example 1: Latent variable models
The log-likelihood (θ) of the n observations dependence upon the obs. is omitted
(θ) = log
X
p(x, θ)
complete likelihood
µ(dx)
Untractable integral
Its gradient
(θ) = ∂θ log p(x, θ)
p(x, θ)
p(u, θ)µ(du)
µ(dx)
a posteriori distribution
Untractable integral since the normalizing constant unknown −→ MCMC
6 / 12
Motivations for MCMC approx (3/3)
Example 2: Binary graphical model
N i.i.d. {0, 1}p
observations from the distribution
πθ(y1:p) ∝
1
Zθ
exp


p
i=1
θiyi +
1≤i<j≤p
θij1Iyi=yj


The log-likelihood of the obs. Y 1
, · · · , Y N
(θ) =
p
i=1
θi
N
n=1
Y n
i +
1≤i<j≤p
θij
N
n=1
1IY n
i =Y n
j
− N log Zθ
Its gradient
θi
(θ) =
N
n=1
Y n
i −
y1:p∈{0,1}p
yiπθ(y)
θij
(θ) =
N
n=1
1IY n
i =Y n
j
−
y1:p∈{0,1}p
1Iyi=yj
πθ(y)
7 / 12
Results on Perturbed Prox-Gdt (1/2)
Set: L = argminΘ(f + g) ηn+1 = f(θn) − f(θn)
Theorem (Atchad´e, F., Moulines (2015))
Assume
g convex, lower semi-continuous; f convex, C1
and its gradient is Lipschitz
with constant L; L is non empty.
n γn = +∞ and γn ∈ (0, 1/L].
Convergence of the series
n
γ2
n+1 ηn+1
2
,
n
γn+1ηn+1,
n
γn+1 An, ηn+1
where An = Proxγn+1,g(θn − γn+1 f(θn)).
Then there exists θ ∈ L such that limn θn = θ .
It generalizes and improves on previous results. What can be said in the
non-convex case (open question) and with non explicit “Prox” ?
8 / 12
Results on Perturbed Prox-Gdt (2/2)
Given non-negative weights a1, · · · , an, set An
def
=
n
k=1 ak
Theorem (Atchad´e, F., Moulines)
For any θ ∈ argminΘ(f + g),
(f + g)
n
k=1
ak
An
θk − min(f + g) ≤
a0
2γ0An
θ0 − θ 2
+
1
2An
n
k=1
ak
γk
−
ak−1
γk−1
θk−1 − θ 2
+
1
An
n
k=1
akγk ηk
2
−
1
An
n
k=1
ak Ak−1 − θ , ηk
In the case of stochastic perturbation ηk = f(θk) − f(θk): it yields bounds
with high probability, in expectation, in Lq
, · · ·
9 / 12
Stochastic Prox-Gdt, with (possibly) biased MC
approximation
Under ergodic conditions on the MCMC samplers, we have
F
1
n
n
k=1
θk − min F
Lq
= O (un)
with
Constant MC batch size mn = m (i.e. non vanishing approximation →
technical proof)
un =
1
√
n
with γn =
γ
na
, a ∈ [1/2, 1]
Increasing MC batch size
un =
1
n
with γn = γ mn ∝ n
Rate with a computational MC cost: O(n2
).
10 / 12
Nesterov-based acceleration of the Stochastic Prox-Gdt alg
Convergence Choose γn, mn, tn s.t.
γn ∈ (0, 1/L] , γk+1tk(tk − 1) ≤ γkt2
k−1
lim
n
γnt2
n = +∞,
n
γntn(1 + γntn)
1
mn
< ∞
Then there exists θ ∈ argminΘF s.t limn θn = θ .
Rate on F In addition
E [F(θn+1) − min F] = O (un)
γn mn tn un NbrMC
γ n3
n n−2
n4
γ/
√
n n2
n n−3/2
n3
In all strategies: for a MC computational cost N, the rate is 1/
√
N.
11 / 12
Open questions
1 Variance reduction technique Here the variance of the MC approximation is
O(1/mn). What happens when a “variance reduction” MC technique is used
?
2 Averaging Given non-negative weights a1, · · · , an, do γk, tk, mk exist such
that
sup
n
an ((f + g)(θn) − min(f + g)) < ∞
(f + g)
n
k=1
ak
n
j=1 aj
θk − min(f + g) = O
1
n
k=1 ak
3 Maximal rate What is the maximal rate after n iterations ? after N Monte
Carlo draws ?
4 (F)ISTA ? What about tn = O(nd
) for some 0 < d < 1 ?
A first answer: With variance reduction MC techniques, Nesterov acceleration
(d = 1), γk = γ, mn = n3
and an = n: after N MC draws, the rate is always
better than 1/
√
N
12 / 12

More Related Content

PDF
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
CDSL_at_SNU
 
PDF
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
PDF
Richard Everitt's slides
Christian Robert
 
PDF
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
PDF
Recursive Compressed Sensing
Pantelis Sopasakis
 
PDF
The low-rank basis problem for a matrix subspace
Tasuku Soma
 
PDF
Tensor Train data format for uncertainty quantification
Alexander Litvinenko
 
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
The Statistical and Applied Mathematical Sciences Institute
 
Need for Controllers having Integer Coefficients in Homomorphically Encrypted D...
CDSL_at_SNU
 
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Richard Everitt's slides
Christian Robert
 
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Recursive Compressed Sensing
Pantelis Sopasakis
 
The low-rank basis problem for a matrix subspace
Tasuku Soma
 
Tensor Train data format for uncertainty quantification
Alexander Litvinenko
 

What's hot (20)

PDF
Regret Minimization in Multi-objective Submodular Function Maximization
Tasuku Soma
 
PDF
Mark Girolami's Read Paper 2010
Christian Robert
 
PDF
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Tensor train to solve stochastic PDEs
Alexander Litvinenko
 
PDF
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Tasuku Soma
 
PDF
Introduction to MCMC methods
Christian Robert
 
PDF
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Tasuku Soma
 
PDF
Asymptotic Notation
sohelranasweet
 
PDF
Markov Chain Monte Carlo Methods
Francesco Casalegno
 
PDF
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Metropolis-Hastings MCMC Short Tutorial
Ralph Schlosser
 
PDF
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
PPT
Time andspacecomplexity
LAKSHMITHARUN PONNAM
 
PDF
Project on CFD using MATLAB
Rohit Avadhani
 
DOC
Time and space complexity
Ankit Katiyar
 
PDF
Multicasting in Linear Deterministic Relay Network by Matrix Completion
Tasuku Soma
 
PDF
Kolmogorov complexity and topological arguments
shenmontpellier
 
PPT
asymptotic notations i
Ali mahmood
 
PDF
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
Regret Minimization in Multi-objective Submodular Function Maximization
Tasuku Soma
 
Mark Girolami's Read Paper 2010
Christian Robert
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
The Statistical and Applied Mathematical Sciences Institute
 
Tensor train to solve stochastic PDEs
Alexander Litvinenko
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Tasuku Soma
 
Introduction to MCMC methods
Christian Robert
 
Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm
Tasuku Soma
 
Asymptotic Notation
sohelranasweet
 
Markov Chain Monte Carlo Methods
Francesco Casalegno
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
The Statistical and Applied Mathematical Sciences Institute
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Metropolis-Hastings MCMC Short Tutorial
Ralph Schlosser
 
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
Time andspacecomplexity
LAKSHMITHARUN PONNAM
 
Project on CFD using MATLAB
Rohit Avadhani
 
Time and space complexity
Ankit Katiyar
 
Multicasting in Linear Deterministic Relay Network by Matrix Completion
Tasuku Soma
 
Kolmogorov complexity and topological arguments
shenmontpellier
 
asymptotic notations i
Ali mahmood
 
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
Ad

Similar to QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient Algorithms - Gersende Fort, Mar 22, 2018 (20)

PDF
Firefly exact MCMC for Big Data
Gianvito Siciliano
 
PDF
Intro to Approximate Bayesian Computation (ABC)
Umberto Picchini
 
PDF
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Christian Robert
 
PDF
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Sean Meyn
 
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
PDF
talk_NASPDE.pdf
Chiheb Ben Hammouda
 
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Geoffrey Négiar
 
PDF
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Umberto Picchini
 
PDF
Recent developments on unbiased MCMC
Pierre Jacob
 
PDF
Unbiased Hamiltonian Monte Carlo
JeremyHeng10
 
PDF
Minimum mean square error estimation and approximation of the Bayesian update
Alexander Litvinenko
 
PDF
Unbiased Bayes for Big Data
Christian Robert
 
PDF
Inference for stochastic differential equations via approximate Bayesian comp...
Umberto Picchini
 
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
PDF
Unbiased MCMC with couplings
Pierre Jacob
 
PDF
Presentation.pdf
Chiheb Ben Hammouda
 
PDF
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Distributional RL via Moment Matching
taeseon ryu
 
Firefly exact MCMC for Big Data
Gianvito Siciliano
 
Intro to Approximate Bayesian Computation (ABC)
Umberto Picchini
 
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
Christian Robert
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Sean Meyn
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Coverage of Credible I...
The Statistical and Applied Mathematical Sciences Institute
 
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
talk_NASPDE.pdf
Chiheb Ben Hammouda
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Geoffrey Négiar
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Umberto Picchini
 
Recent developments on unbiased MCMC
Pierre Jacob
 
Unbiased Hamiltonian Monte Carlo
JeremyHeng10
 
Minimum mean square error estimation and approximation of the Bayesian update
Alexander Litvinenko
 
Unbiased Bayes for Big Data
Christian Robert
 
Inference for stochastic differential equations via approximate Bayesian comp...
Umberto Picchini
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Unbiased MCMC with couplings
Pierre Jacob
 
Presentation.pdf
Chiheb Ben Hammouda
 
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
The Statistical and Applied Mathematical Sciences Institute
 
Distributional RL via Moment Matching
taeseon ryu
 
Ad

More from The Statistical and Applied Mathematical Sciences Institute (20)

PDF
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 

Recently uploaded (20)

PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
CDH. pptx
AneetaSharma15
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
CDH. pptx
AneetaSharma15
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
Basics and rules of probability with real-life uses
ravatkaran694
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 

QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient Algorithms - Gersende Fort, Mar 22, 2018

  • 1. Perturbed (accelerated) Proximal-Gradient algorithms Gersende Fort CNRS & Institut de Math´ematiques de Toulouse France Works with Eric Moulines (Ecole Polytechnique, France); Yves Atchad´e (Univ. Michigan, USA); J.F. Aujol (Univ. Bordeaux, France) and C. Dossal (INSA Toulouse, France) 1 / 12
  • 2. Interested in (1/3) (arg)minθ∈Rp (f(θ) + g(θ)) with g : Rp → [0, ∞] is convex, non smooth, not identically equal to +∞, and lsc. Proxγg(τ) is explicit f is smooth (gradient Lipschitz) with an untractable gradient Algorithm: Perturbed Proximal-Gradient θk+1 = Proxγk+1g θk − γk+1 f(θk) Questions: Conditions on γk+1 and on f(θk) − f(θk) to ensure the same limiting behavior as the Prox-Gdt algorithm ? 2 / 12
  • 3. Interested in (2/3) Furthermore, in the case a) the gradient is an untractable expectation f(θ) = X H(θ, x) explicit πθ(dx) probability b) Stochastic approximation to avoid curse of dimensionality c) i.i.d. Monte Carlo not possible/efficient → Markov Chain MC (MCMC) sampling Questions: Since MCMC provides a biased approximation f(θk) ≈ 1 mk+1 mk+1 j=1 H(θ, Xjk) E   1 mk+1 mk+1 j=1 H(θ, Xjk)   − f(θk) = 0 where {X1k, · · · , Xjk, · · · } Markov chain with stationary distribution πθk which conditions on γk+1 and on the Monte Carlo batch size mk+1 ? is it possible to have a non vanishing bias i.e. mk+1 = m ? 3 / 12
  • 4. Interested in (3/3) Perturbed Prox-Gdt + Acceleration: τk = θk + tk−1 − 1 tk (θk − θk−1) θk+1 = Proxγk+1g θk − γk+1 f(τk) Questions: Which sequences γk, tk, among those satisfying γk+1tk(tk − 1) ≤ γkt2 k−1 When stochastic approx of the gradient: which Monte Carlo batch size mk ? Is there a gain to consider tk = O(kd ) for some 0 ≤ d ≤ 1 ? 4 / 12
  • 5. Motivations for MCMC approx (1/3) Computational Statistics, Statistical Learning Online learning: here the “Monte Carlo points” are the examples/observations. Penalized Maximum Likelihood Estimation in a parametric model argminθ f(θ) negative log-likelihood + g(θ) penalty term 5 / 12
  • 6. Motivations for MCMC approx (2/3) Example 1: Latent variable models The log-likelihood (θ) of the n observations dependence upon the obs. is omitted (θ) = log X p(x, θ) complete likelihood µ(dx) Untractable integral Its gradient (θ) = ∂θ log p(x, θ) p(x, θ) p(u, θ)µ(du) µ(dx) a posteriori distribution Untractable integral since the normalizing constant unknown −→ MCMC 6 / 12
  • 7. Motivations for MCMC approx (3/3) Example 2: Binary graphical model N i.i.d. {0, 1}p observations from the distribution πθ(y1:p) ∝ 1 Zθ exp   p i=1 θiyi + 1≤i<j≤p θij1Iyi=yj   The log-likelihood of the obs. Y 1 , · · · , Y N (θ) = p i=1 θi N n=1 Y n i + 1≤i<j≤p θij N n=1 1IY n i =Y n j − N log Zθ Its gradient θi (θ) = N n=1 Y n i − y1:p∈{0,1}p yiπθ(y) θij (θ) = N n=1 1IY n i =Y n j − y1:p∈{0,1}p 1Iyi=yj πθ(y) 7 / 12
  • 8. Results on Perturbed Prox-Gdt (1/2) Set: L = argminΘ(f + g) ηn+1 = f(θn) − f(θn) Theorem (Atchad´e, F., Moulines (2015)) Assume g convex, lower semi-continuous; f convex, C1 and its gradient is Lipschitz with constant L; L is non empty. n γn = +∞ and γn ∈ (0, 1/L]. Convergence of the series n γ2 n+1 ηn+1 2 , n γn+1ηn+1, n γn+1 An, ηn+1 where An = Proxγn+1,g(θn − γn+1 f(θn)). Then there exists θ ∈ L such that limn θn = θ . It generalizes and improves on previous results. What can be said in the non-convex case (open question) and with non explicit “Prox” ? 8 / 12
  • 9. Results on Perturbed Prox-Gdt (2/2) Given non-negative weights a1, · · · , an, set An def = n k=1 ak Theorem (Atchad´e, F., Moulines) For any θ ∈ argminΘ(f + g), (f + g) n k=1 ak An θk − min(f + g) ≤ a0 2γ0An θ0 − θ 2 + 1 2An n k=1 ak γk − ak−1 γk−1 θk−1 − θ 2 + 1 An n k=1 akγk ηk 2 − 1 An n k=1 ak Ak−1 − θ , ηk In the case of stochastic perturbation ηk = f(θk) − f(θk): it yields bounds with high probability, in expectation, in Lq , · · · 9 / 12
  • 10. Stochastic Prox-Gdt, with (possibly) biased MC approximation Under ergodic conditions on the MCMC samplers, we have F 1 n n k=1 θk − min F Lq = O (un) with Constant MC batch size mn = m (i.e. non vanishing approximation → technical proof) un = 1 √ n with γn = γ na , a ∈ [1/2, 1] Increasing MC batch size un = 1 n with γn = γ mn ∝ n Rate with a computational MC cost: O(n2 ). 10 / 12
  • 11. Nesterov-based acceleration of the Stochastic Prox-Gdt alg Convergence Choose γn, mn, tn s.t. γn ∈ (0, 1/L] , γk+1tk(tk − 1) ≤ γkt2 k−1 lim n γnt2 n = +∞, n γntn(1 + γntn) 1 mn < ∞ Then there exists θ ∈ argminΘF s.t limn θn = θ . Rate on F In addition E [F(θn+1) − min F] = O (un) γn mn tn un NbrMC γ n3 n n−2 n4 γ/ √ n n2 n n−3/2 n3 In all strategies: for a MC computational cost N, the rate is 1/ √ N. 11 / 12
  • 12. Open questions 1 Variance reduction technique Here the variance of the MC approximation is O(1/mn). What happens when a “variance reduction” MC technique is used ? 2 Averaging Given non-negative weights a1, · · · , an, do γk, tk, mk exist such that sup n an ((f + g)(θn) − min(f + g)) < ∞ (f + g) n k=1 ak n j=1 aj θk − min(f + g) = O 1 n k=1 ak 3 Maximal rate What is the maximal rate after n iterations ? after N Monte Carlo draws ? 4 (F)ISTA ? What about tn = O(nd ) for some 0 < d < 1 ? A first answer: With variance reduction MC techniques, Nesterov acceleration (d = 1), γk = γ, mn = n3 and an = n: after N MC draws, the rate is always better than 1/ √ N 12 / 12