Backprop

Backprop 算法的推导

给定样本集合 {(x(1),y(1)),,(x(m),y(m))}\{(x^{(1)}, y^{(1)}), \cdots, (x^{(m)}, y^{(m)})\} ,我们可以用批量梯度下降法来求解神经网络。对于单个样本 {(x,y)}\{(x, y)\},方差代价函数是

J(W,b;x,y)=12hw,b(x)y2J(W,b;x,y)=\frac{1}{2} ||h_{w,b}(x) - y||^2

定义整体代价函数为

J(w,b)=[1mi=1mJ(W,b;x(i),y(i))]+λ2l=1nl1i=1slj=1sl+1(Wji(l))2=[1mi=1m(12hW,b(x(i))y(i)2)]+λ2l=1nl1i=1slj=1sl+1(Wji(l))2\begin{aligned} J(w,b) &= \bigg[\frac{1}{m} \sum_{i=1}^m J(W,b;x^{(i)}, y^{(i)})\bigg] + \frac {\lambda} {2} \sum_{l=1}^{n_l-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}}\bigg(W_{ji}^{(l)}\bigg)^2 \\ &= \bigg[\frac{1}{m} \sum_{i=1}^m \big(\frac{1}{2} ||h_{W,b}(x^{(i)}) - y^{(i)}||^2\big)\bigg] + \frac {\lambda} {2} \sum_{l=1}^{n_l-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}}\bigg(W_{ji}^{(l)}\bigg)^2 \end{aligned}

其中第一项是均方差项,第二项是正则(权重衰减)项(对偏置项 bb 导正则化作用不大,所以省略)。

梯度下降法中,每次迭代更新参数规则

Wij(l)=Wij(l)αWij(l)J(W,b)bi(l)=bi(l)αbi(l)J(W,b)\begin{aligned} W_{ij}^{(l)} &= W_{ij}^{(l)} - \alpha \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b) \\ b_{i}^{(l)} &= b_{i}^{(l)} - \alpha \frac{\partial}{\partial b_i^{(l)}} J(W,b) \end{aligned}

其中 J(W,b)J(W,b) 的偏导数为

Wij(l)J(W,b)=1mi=1mWij(l)J(W,b;x(i),y(i))+λWij(l)bi(l)J(W,b)=1mi=1mbi(l)J(W,b;x(i),y(i))\begin{aligned} \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b) &= \frac{1}{m} \sum_{i=1}^m \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b;x^{(i)}, y^{(i)}) + \lambda W_{ij}^{(l)} \\ \frac{\partial}{\partial b_{i}^{(l)}} J(W,b) &= \frac{1}{m} \sum_{i=1}^m \frac{\partial}{\partial b_{i}^{(l)}} J(W,b;x^{(i)}, y^{(i)}) \end{aligned}

反向传播的思路:

  1. 给定样本 (x,y)(x,y),先前向传导计算出网络中所有激活值

  2. ll 层的第一个节点 ii,计算其残差 δi(l)\delta_i^{(l)}

    • 对最终输出节点,残差 δi(nl)\delta_i^{(n_l)} 等于激活值与样本的差

    δinl=zi(nl)12yhW,b(x)2=zi(nl)12j=1Snl(yiaj(nl))2=zi(nl)12j=1Snl(yjf(zj(nl)))2=(yif(zi(nl)))f(zi(nl))=(yiai(nl))f(zi(nl))\begin{aligned} \delta_i^{n_l} &=\frac{\partial}{\partial z_i^{(n_l)}} \frac{1}{2} ||y - h_{W,b}(x)||^2 \\ &= \frac{\partial}{\partial z_i^{(n_l)}} \frac{1}{2} \sum_{j=1}^{S_{n_l}} (y_i - a_j^{(n_l)})^2 \\ &= \frac{\partial}{\partial z_i^{(n_l)}} \frac{1}{2} \sum_{j=1}^{S_{n_l}}(y_j - f(z_j^{(n_l)}))^2\\ &= -(y_i - f(z_i^{(n_l)})) f'(z_i^{(n_l)})\\ &= -(y_i - a_i^{(n_l)}) f'(z_i^{(n_l)}) \end{aligned}

    • 对隐藏节点 (l=nl1,nl2,,2)(l=n_l -1, n_l - 2,\cdots,2),基于节点残差的加权平均值计算 δi(l)\delta_i^{(l)},以 ai(l)a_i^{(l)} 作为输入

    δi(l)=(j=1sl+1Wji(l)δj(l+1))f(zi(l))\delta_i^{(l)} = \bigg( \sum_{j=1}^{s_{l+1}} W_{ji}^{(l)} \delta_j^{(l+1)} \bigg) f'(z_i^{(l)})

  3. 计算偏导数(根据链式法则,例子见 [2])

Wij(l)J(W,b;x,y)=aj(l)δi(l+1)bij(l)J(W,b;x,y)=δi(l+1)\begin{aligned} \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b;x,y) &= a_j^{(l)} \delta_i^{(l+1)} \\ \frac{\partial}{\partial b_{ij}^{(l)}} J(W,b;x,y) &= \delta_i^{(l+1)} \end{aligned}

用矩阵 - 向量表示法重写以上算法,用 \cdot 表示向量乘积,BP 算法步骤为

  1. 进行前馈传导计算,得到 L2,L3,,LnlL_2, L_3, \cdots, L_{n_l} 的激活值
  2. 对输出层(第 nln_l 层),计算

δ(l)=(ya(nl))f(z(nl))\delta^{(l)} = - (y - a^{(n_l)}) \cdot f'(z^{(n_l)})

  1. 对隐藏层 l=nl1,nl2,nl3,,2l=n_l - 1, n_l - 2, n_l - 3, \cdots, 2 的各层,计算

δ(l)=((W(l))Tδ(l+1))f(z(l))\delta^{(l)} = \big( (W^{(l)})^T \delta^{(l+1)} \big) \cdot f'(z^{(l)})

  1. 计算偏导数值

W(l)J(W,b;x,y)=δ(l+1)(a(l))Tb(l)J(W,b;x,y)=δ(l+1)\begin{aligned} \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T \\ \nabla_{b^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} \end{aligned}

参考

[1] UFLDL 反向传导算法 URL

[2] Principles of training multi-layer neural network using backpropagation URL

本文有帮助?