Navigator
Dual problem
min
f
(
x
)
s
.
t
.
A
x
=
b
\begin{aligned} &\min & f(x)\\ &s.t. & Ax=b \end{aligned}
mins.t.f(x)Ax=b
Lagrangian
L
(
x
,
y
)
=
f
(
x
)
+
y
T
(
A
x
−
b
)
L(x, y)=f(x)+y^T(Ax-b)
L(x,y)=f(x)+yT(Ax−b)
dual function:
g
(
y
)
=
inf
x
L
(
x
,
y
)
g(y)=\inf_x L(x,y)
g(y)=infxL(x,y)
The dual problem is as follows:
max
y
g
(
y
)
\max_y g(y)
maxyg(y)
recover
x
∗
:
arg min
x
L
(
x
,
y
∗
)
,
y
∗
=
arg max
y
g
(
y
)
x^* :\argmin_x L(x, y^*), y^*=\argmax_y g(y)
x∗:xargminL(x,y∗),y∗=yargmaxg(y)
Dual ascent for maximal value:
y
k
+
1
=
y
k
+
α
k
∇
g
(
y
k
)
x
~
=
arg min
x
L
(
x
,
y
k
)
∇
g
(
y
k
)
=
∇
y
(
L
(
x
~
,
y
k
)
)
=
∇
y
(
f
(
x
~
)
+
(
y
k
)
T
(
A
x
~
−
b
)
)
=
A
x
~
−
b
\begin{aligned} &y^{k+1}=y^k+\alpha^k\nabla g(y^k)\\ &\tilde{x}=\argmin_x L(x, y^k)\\ &\nabla g(y^k)=\nabla_y(L(\tilde{x}, y^k))=\nabla_y(f(\tilde{x})+(y^k)^T(A\tilde{x}-b))=A\tilde{x}-b \end{aligned}
yk+1=yk+αk∇g(yk)x~=xargminL(x,yk)∇g(yk)=∇y(L(x~,yk))=∇y(f(x~)+(yk)T(Ax~−b))=Ax~−b
Dual ascent algorithm:
x
k
+
1
=
arg
min
x
L
(
x
,
y
k
)
/
/
x
−
update
y
k
+
1
=
y
k
/
/
dual update
\begin{aligned} &x^{k+1}=\arg\min_x L(x, y^k) // x-\text{update}\\ &y^{k+1}=y^k // \text{dual update} \end{aligned}
xk+1=argxminL(x,yk)//x−updateyk+1=yk//dual update
Dual Decomposition
Suppose
f
f
f is separable
min
f
(
x
)
=
∑
i
=
1
N
f
i
(
x
i
)
s
.
t
A
x
=
b
x
=
(
x
1
,
x
2
,
…
,
x
N
)
A
x
=
[
A
1
,
A
2
,
…
,
A
N
]
x
=
A
1
x
1
+
A
2
x
2
+
⋯
+
A
N
x
n
\begin{aligned} &\min f(x)=\sum_{i=1}^N f_i(x_i)\\ &s.t \quad Ax=b\\ & x=(x_1, x_2, \dots, x_N)\\ &Ax = [A_1, A_2, \dots, A_N]x = A_1x_1+A_2x_2+\dots+A_Nx_n \end{aligned}
minf(x)=i=1∑Nfi(xi)s.tAx=bx=(x1,x2,…,xN)Ax=[A1,A2,…,AN]x=A1x1+A2x2+⋯+ANxn
L
(
x
,
y
)
L(x, y)
L(x,y)可分:
L
(
x
,
y
)
=
∑
i
=
1
N
f
i
(
x
i
)
+
y
T
(
∑
i
=
1
N
A
i
x
i
−
b
)
=
∑
i
=
1
N
(
f
i
(
x
i
)
+
y
T
A
i
x
i
)
−
y
T
b
\begin{aligned} L(x, y)&=\sum_{i=1}^N f_i(x_i)+y^T(\sum_{i=1}^NA_ix_i-b)\\ &=\sum_{i=1}^N(f_i(x_i)+y^TA_ix_i)-y^Tb \end{aligned}
L(x,y)=i=1∑Nfi(xi)+yT(i=1∑NAixi−b)=i=1∑N(fi(xi)+yTAixi)−yTb
L is separable in
x
x
x
→
\to
→
x
−
min
x-\min
x−min compute parallelly
x
i
k
+
1
=
arg
min
x
i
L
i
(
x
i
,
y
k
)
x_i^{k+1}=\arg\min_{x_i}L_i(x_i, y^k)
xik+1=argminxiLi(xi,yk)
Method of multiplier
A method to robustify dual ascent, Augmented Lagrangian function
L
ρ
(
x
,
y
)
=
f
(
x
)
+
y
T
(
A
x
−
b
)
+
ρ
2
∥
A
x
−
b
∥
2
2
,
ρ
>
0
L_{\rho}(x, y)=f(x)+y^T(Ax-b)+\frac{\rho}{2}\|Ax-b\|_2^2, \rho>0
Lρ(x,y)=f(x)+yT(Ax−b)+2ρ∥Ax−b∥22,ρ>0
ADMM
min
f
(
x
)
+
g
(
z
)
s
.
t
.
A
x
+
B
z
=
C
\min f(x)+g(z)\\ s.t. Ax+Bz=C
minf(x)+g(z)s.t.Ax+Bz=C
Lagrangian function
L
ρ
(
x
,
z
,
y
)
=
f
(
x
)
+
g
(
z
)
+
y
T
(
A
x
+
B
z
−
C
)
+
ρ
2
∥
A
x
+
B
z
−
C
∥
2
2
L_\rho(x, z, y)=f(x)+g(z)+y^T(Ax+Bz-C)+\frac{\rho}{2}\|Ax+Bz-C\|_2^2
Lρ(x,z,y)=f(x)+g(z)+yT(Ax+Bz−C)+2ρ∥Ax+Bz−C∥22
ADMM:
x
k
+
1
=
arg
min
x
L
ρ
(
x
,
z
k
,
y
k
)
x-minimization
z
k
+
1
=
arg
min
z
L
ρ
(
x
k
+
1
,
z
,
y
k
)
z-minimization
y
k
+
1
=
y
k
+
ρ
(
A
x
k
+
1
+
B
z
k
+
1
−
C
)
dual update
\begin{aligned} &x^{k+1}=\arg\min_x L_\rho (x, z^k, y^k)&\text{x-minimization}\\ &z^{k+1}=\arg\min_z L_\rho(x^{k+1},z, y^k)&\text{z-minimization}\\ &y^{k+1}=y^k+\rho(Ax^{k+1}+Bz^{k+1}-C)&\text{dual update} \end{aligned}
xk+1=argxminLρ(x,zk,yk)zk+1=argzminLρ(xk+1,z,yk)yk+1=yk+ρ(Axk+1+Bzk+1−C)x-minimizationz-minimizationdual update
Optimal condition
z
k
+
1
z^{k+1}
zk+1 minimize
L
(
x
k
+
1
,
z
,
y
k
)
L(x^{k+1}, z, y^k)
L(xk+1,z,yk)
o
=
∇
z
L
ρ
(
x
k
+
1
,
z
k
+
1
,
y
k
)
=
∇
g
(
z
k
+
1
)
+
B
T
y
k
+
ρ
B
T
(
A
x
k
+
1
+
B
z
k
+
1
−
C
)
=
∇
g
(
z
k
+
1
)
+
B
T
(
y
k
+
ρ
(
A
x
k
+
1
+
B
z
k
+
1
−
C
)
)
(
x
k
+
1
,
z
k
+
1
,
y
k
+
1
)
=
∇
g
(
z
k
+
1
)
+
B
T
y
k
+
1
\begin{aligned} o&= \nabla_zL_\rho(x^{k+1}, z^{k+1}, y^k)\\ &=\nabla g(z^{k+1})+B^Ty^k+\rho B^T(Ax^{k+1}+Bz^{k+1}-C)\\ &=\nabla g(z^{k+1})+B^T(y^k+\rho(Ax^{k+1}+Bz^{k+1}-C))\\ (x^{k+1}, z^{k+1}, y^{k+1})&=\nabla g(z^{k+1})+B^Ty^{k+1} \end{aligned}
o(xk+1,zk+1,yk+1)=∇zLρ(xk+1,zk+1,yk)=∇g(zk+1)+BTyk+ρBT(Axk+1+Bzk+1−C)=∇g(zk+1)+BT(yk+ρ(Axk+1+Bzk+1−C))=∇g(zk+1)+BTyk+1
satisfies
2
n
d
2nd
2nd dual feasibility.
Examples
constrained convex problem
min
f
(
x
)
s
.
t
.
x
∈
C
\min \quad f(x)\\ s.t. \quad x\in C
minf(x)s.t.x∈C
Take
g
g
g as an indicator function
g
(
x
)
=
{
0
,
x
∈
C
+
∞
,
x
∉
C
g(x)= \begin{cases} 0, &x\in C\\ +\infty, &x\notin C \end{cases}
g(x)={0,+∞,x∈Cx∈/C
ADMM form:
min
f
(
x
)
+
g
(
z
)
s
.
t
.
x
−
z
=
0
\min f(x)+g(z)\\ s.t. x-z=0
minf(x)+g(z)s.t.x−z=0
The Lagrangian function is:
L
ρ
(
x
,
z
,
y
)
=
f
(
x
)
+
g
(
z
)
+
y
T
(
x
−
z
)
+
ρ
2
∥
x
−
z
∥
2
2
L_\rho(x, z, y)=f(x)+g(z)+y^T(x-z)+\frac{\rho}{2}\|x-z\|_2^2
Lρ(x,z,y)=f(x)+g(z)+yT(x−z)+2ρ∥x−z∥22
Algorithm:
x
k
+
1
=
arg
min
x
f
(
x
)
+
ρ
2
∥
x
−
z
k
+
u
k
∥
2
2
z
k
+
1
=
Π
c
(
x
k
+
1
+
u
k
)
u
k
+
1
=
u
k
+
x
k
+
1
−
z
k
+
1
\begin{aligned} &x^{k+1}=\arg\min_x f(x)+\frac{\rho}{2}\|x-z^k+u^k\|_2^2\\ &z^{k+1}=\Pi_c(x^{k+1}+u^k)\\ &u^{k+1}=u^k+x^{k+1}-z^{k+1} \end{aligned}
xk+1=argxminf(x)+2ρ∥x−zk+uk∥22zk+1=Πc(xk+1+uk)uk+1=uk+xk+1−zk+1
Lasso (加入稀疏约束的回归问题)
min
1
2
∥
A
x
−
b
∥
2
2
+
λ
∥
z
∥
1
s
.
t
.
x
−
z
=
0
\min \frac{1}{2}\|Ax-b\|_2^2+\lambda\|z\|_1\\ s.t.\quad x-z=0
min21∥Ax−b∥22+λ∥z∥1s.t.x−z=0
Make Lagrangian function
L
ρ
(
x
,
z
,
y
)
=
1
2
∥
A
x
−
b
∥
2
2
+
λ
∥
z
∥
1
+
y
T
(
x
−
z
)
+
ρ
2
∥
x
−
z
∥
2
2
L_\rho(x,z,y)=\frac{1}{2}\|Ax-b\|_2^2+\lambda \|z\|_1+y^T(x-z)+\frac{\rho}{2}\|x-z\|_2^2
Lρ(x,z,y)=21∥Ax−b∥22+λ∥z∥1+yT(x−z)+2ρ∥x−z∥22
∇
x
L
ρ
=
A
T
(
A
x
−
b
)
+
y
+
ρ
2
(
x
−
z
)
\begin{aligned} \nabla_xL_\rho=A^T(Ax-b)+y+\frac{\rho}{2}(x-z) \end{aligned}
∇xLρ=AT(Ax−b)+y+2ρ(x−z)
Update
x
k
:
=
(
A
T
A
+
ρ
I
)
−
1
(
A
T
b
+
ρ
z
−
y
)
z
k
+
1
:
=
S
λ
/
ρ
(
x
+
y
ρ
)
y
k
+
1
=
y
k
+
ρ
ρ
(
x
k
+
1
−
z
k
+
1
)
x^k :=(A^TA+\rho I)^{-1}(A^Tb+\rho z-y)\\ z^{k+1}:=S_{\lambda/\rho}(x+\frac{y}{\rho})\\ y^{k+1}=y^k+\rho\rho(x^{k+1}-z^{k+1})
xk:=(ATA+ρI)−1(ATb+ρz−y)zk+1:=Sλ/ρ(x+ρy)yk+1=yk+ρρ(xk+1−zk+1)