机器学习05:正则化
1 线性回归的正则化
1.1 损失函数
J(θ)=12m[∑i=1m(hθ(x(i))−y(i))2+λ∑j=0nθj2] J(\theta)=\frac{1}{2m}\left[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=0}^n\theta_j^2 \right] J(θ)=2m1[i=1∑m(hθ(x(i))−y(i))2+λj=0∑nθj2]
1.2 梯度下降法
Repeat:Repeat:Repeat:
θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x0(i)θj:=θj−α[1m∑i=1m(hθ(x(i))−y(i))xj(i)−λmθj]:=θj(1−αλm)−α1m∑i=1m(hθ(x(i))−y(i))xj(i)
\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}\\
\begin{aligned}\theta_j&:=\theta_j-\alpha\left[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}-\frac{\lambda}{m}\theta_j\right]\\&:=\theta_j(1-\alpha\frac{\lambda}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}\end{aligned}
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)θj:=θj−α[m1i=1∑m(hθ(x(i))−y(i))xj(i)−mλθj]:=θj(1−αmλ)−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
1.3 正规方程法
X=[(x(1))T⋅⋅⋅(x(m))T],y=[y(1)⋅⋅⋅y(m)]
X=\begin{bmatrix}(x^{(1)})^T\\···\\
(x^{(m)})^T\end{bmatrix},\quad
y=\begin{bmatrix}y^{(1)}\\···\\
y^{(m)}\end{bmatrix}
X=⎣⎡(x(1))T⋅⋅⋅(x(m))T⎦⎤,y=⎣⎡y(1)⋅⋅⋅y(m)⎦⎤
解得:
θ=(XTX+λ[00⋯001⋯0⋮⋮⋱⋮00⋯1])−1XTy
\theta=(X^TX+\lambda\begin{bmatrix} 0 & 0 & \cdots & 0\\ 0& 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1\end{bmatrix})^{-1}X^Ty
θ=(XTX+λ⎣⎢⎢⎢⎡00⋮001⋮0⋯⋯⋱⋯00⋮1⎦⎥⎥⎥⎤)−1XTy
1.4 矩阵不可逆问题的解决
正则化之后,线性回归中的不可逆问题将不存在。
2 Logistic 回归的正则化
2.1 损失函数
J(θ)=−1m[∑i=1my(i)log hθ(x(i))+(1−y(i))log (1−hθ(x(i)))]+λ2m∑j=1nθj2 J(\theta)=-\frac{1}{m}\left[\sum_{i=1}^my^{(i)}log\,h_\theta(x^{(i)})+(1-y^{(i)})log\,(1-h_\theta(x^{(i)}))\right]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2 J(θ)=−m1[i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2
2.2 梯度下降
Repeat:Repeat:Repeat:
θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))x0(i)θj:=θj−α[1m∑i=1m(hθ(x(i))−y(i))xj(i)−λmθj]
\theta_0:=\theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}\\\theta_j:=\theta_j-\alpha\left[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}-\frac{\lambda}{m}\theta_j\right]
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)θj:=θj−α[m1i=1∑m(hθ(x(i))−y(i))xj(i)−mλθj]
其中:
hθ(x)=11−e−θTx
h_\theta(x)=\frac{1}{1-e^{-\theta^Tx}}
hθ(x)=1−e−θTx1