读文献——《Learning representations by back-propagating errors》

最新推荐文章于 2025-06-20 09:52:26 发布

Annie-qu

最新推荐文章于 2025-06-20 09:52:26 发布

阅读量665

点赞数

CC 4.0 BY-SA版权

分类专栏：读文献文章标签：神经网络深度学习

本文链接：https://blog.csdn.net/Annie__qu/article/details/107377840

读文献专栏收录该内容

8 篇文章

订阅专栏

Back-procedure, the procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. Internal ‘hidden’ units are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. (反向传播，反复调整网络连接的权重，以便最小化网络的实际输出和所需输出之间的差异。内部“隐藏”单元不是输入或输出的一部分，它们代表了任务域的重要特征，而任务中的规律性是通过网络各单元的相互影响来表现的。)
最简单本质的结构是一个层级表示方法：input layer–> intermediate layers–>output layer，其中隐含层（hidden unit）不直接与输入或输出相连，学习过程中应判断隐含层是否需要被激活。
在这里插入图片描述
Eqn.(2) 也就是sigmod函数。当然，eqn.(1) 和eqn.(2)并不是必须的，别的有界函数也可以，但是在这里使用线性组合在非线性之前可以简化程序。
The total error：

采用梯度下降法来减小loss值，使得实际输出和期望输出差距最小。若想知道某一参数，例如某权重，（本文中即为了找到一组时输出接近所需输出的权重）对E值大小产生的影响，则用E值对该权重做偏导。
For a given case, the partial derivative of the error with respect to each weight are computed in two passes. We have already described the forward pass in which the units in each layer have their states determined by the input they receive from units in lower layers using equations (1) and (2). The backward pass which propagates derivatives from the top layer back to the bottom one is more complicated. (对于给出的例子，误差对每个权重的偏导分为两步计算，第一部分为前向部分，是每层的各单元接收的输入所决定的状态，这个输入来自前一单元，是使用equations (1) and (2)所得的。后向部分是从顶层传回底层的更复杂的导数。)
前向传播：输入通过激活函数sigmod()抵达输出层得到输出值，后向传播：对前向传播得到的输出值迭代处理，以梯度下降的方法重新修改权重
由此，从Eqn.(3)向前推
在这里插入图片描述
链式法则以及Eqn.(2)可以得
∂E/(∂x_j )=∂E/(∂y_j )∙y_j (1-y_j) (5)
从i到j对权重的导数

We accumulate ∂E/∂w over all the input-output cases before changing the weights. The simplest version of gradient descent is to change each weight by an amount proportional to the accumulated ∂E/∂w. But we use an acceleration method in which the current gradient is used to modify the velocity of the point in weight space instead of its position.
在这里插入图片描述
在修改权重值之前需要累加所有的测试，最简单的梯度下降是按比例分配∂E/∂w值来修改每个权重参数。本文中采用一个加速方法加速收敛。
此外还需检查一维input层的数组是否关于中点对称，为保证对称，文章引入了intermediate层。
学习过程的缺点：梯度下降找到的是局部极小值而不是全局极小值（Solution: 一般的找到的local极小值可以认为是global极小值；另外网络中unit之间不需要太多的connection，不然容易造成local minima。）