这一节,Andrew在这一节,避开了 partial derivative(偏导数)的计算,只给出了比较直观的数学解释。
推荐知乎文章:吴恩达机器学习:神经网络 | 反向传播算法
详尽地推导了反向传播算法
Cost Function
Defination
total number of layers in the network number of units(not counting the bias unit) in layer l no. of output units(classes)
Cost Function of Neural Networks
Backpropogation Algorithm
Function Building
Object
Cost Function J
Partial Derivative
Algorithm
公式(1)
公式(2)
公式(3)
==来自以上知乎链接== 帮助更好的理解
纠正:
最后一行
Given training set {
- Set
for all , (having a matrix full of zeros)
For training example:
Set
Perform forward propagation to compute
for Using
, compute Where L is our total number of layers and
is the vector of outputs of the activation units for the last layer. So our “error values” for the last layer are simply the differences of our actual results in the last layer and the correct outputs in y. Compute
using The delta values of layer l are calculated by multiplying the delta values in the next layer with the theta matrix of layer l. We then element-wise multiply that with a function called g’, or g-prime, which is the derivative of the activation function g evaluated with the input values given by
.
The g-prime derivative terms can also be written out as:
or with vectorization,
update thematrix:
, if , if
The capital-delta matrix D is used as an “accumulator“ to add up our values as we go along and eventually compute our partial derivative. Thus we get:
Mathematics intuition
we Introduce an intermediate variable
输出层误差
- Chain Rule
- 当且仅当
,
上式当是通过偏微分运算和Chain Rule推导出来的,不是简单的两向量相减
隐藏层误差
- Chain Rule
- 神经网络的结点运算
求偏导得:
Chain Relu
当且仅当:
- Post title: 08_Backpropagation
- Create time: 2022-01-10 00:44:22
- Post link: Machine-Learning/08-backpropagation/
- Copyright notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.