理解softmax

妖精的绣舞 提交于 2020-01-28 11:14:51

ziL=kwkiLakL1+bkiL=Li=L1yjL=softmax(zjL)=ezjLieziL=LjL \begin{aligned} & z_{i}^{L}=\sum\nolimits_{k}{w_{ki}^{L}a_{k}^{L-1}+b_{ki}^{L}}=第L层第i个神经元的值=第L-1层所有神经元的加权输出 \\ & y_{j}^{L}=softmax(z_{j}^{L})=\frac{{{e}^{z_{j}^{L}}}}{\sum\nolimits_{i}{{{e}^{z_{i}^{L}}}}} = \frac{第L层第j神经元的指数化}{第L层所有神经元指数化求和} \\ \end{aligned}

在这里插入图片描述
{if j=i, yjzi=zi(ezjLkezk)=(ezjL)kezkLezjezi(kezk)2=ezjLkezk(ezjLkezk)2=yj(1yj)if ji, yjzi=zi(ezjLkezk)=ezjL/zikezkLezjezi(kezk)2=0kezkLezjezi(kezk)2=yjyi \left\{ \begin{aligned} & if\ j=i,\ \frac{\partial y_{j}^{{}}}{\partial {{z}_{i}}}=\frac{\partial }{\partial {\color{red}{z}_{i}}}\left( \frac{{{e}^{z_{j}^{L}}}}{\sum\nolimits_{k}{{{e}^{z_{k}^{{}}}}}} \right)\text{=}\frac{{\color{red}({{{e}^{z_{j}^{L}}}{)}'}}\cdot \sum\nolimits_{k}{{{e}^{z_{k}^{L}}}}-{{e}^{z_{j}^{{}}}}\cdot {{e}^{z_{i}^{{}}}}}{{{\left( \sum\nolimits_{k}{{{e}^{z_{k}^{{}}}}} \right)}^{2}}}\text{=}\frac{{{e}^{z_{j}^{L}}}}{\sum\nolimits_{k}{{{e}^{z_{k}^{{}}}}}}-{{\left( \frac{{{e}^{z_{j}^{L}}}}{\sum\nolimits_{k}{{{e}^{z_{k}^{{}}}}}} \right)}^{2}}=\color{red}{{y}_{j}}(1-{{y}_{j}}) \\ & if\ j\ne i,\ \frac{\partial y_{j}^{{}}}{\partial {{z}_{i}}}=\frac{\partial }{\color{red}\partial {{z}_{i}}}\left( \frac{{{e}^{z_{j}^{L}}}}{\sum\nolimits_{k}{{{e}^{z_{k}^{{}}}}}} \right)\text{=}\frac{{}^{\color{red}{\partial {{e}^{z_{j}^{L}}}}/{}_{\partial {{z}_{i}}}\cdot} \sum\nolimits_{k}{{{e}^{z_{k}^{L}}}}-{{e}^{z_{j}^{{}}}}\cdot {{e}^{z_{i}^{{}}}}}{{{\left( \sum\nolimits_{k}{{{e}^{z_{k}^{{}}}}} \right)}^{2}}}\text{=}\frac{{\color{red}0}\cdot \sum\nolimits_{k}{{{e}^{z_{k}^{L}}}}-{{e}^{z_{j}^{{}}}}\cdot {{e}^{z_{i}^{{}}}}}{{{\left( \sum\nolimits_{k}{{{e}^{z_{k}^{{}}}}} \right)}^{2}}}=\color{red}-{{y}_{j}}{{y}_{i}} \\ \end{aligned} \right.

最终softmax函数的在yj{{y}_{j}}zi{{z}_{i}}上的反响传播这条线上的导数分别为:
yjzi={yj(1yj)j=iyjyiji \color{red}{ \frac{\partial y_{j}^{{}}}{\partial {{z}_{i}}}=\left\{ \begin{matrix} {{y}_{j}}(1-{{y}_{j}}) & j=i \\ -{{y}_{j}}{{y}_{i}} & j\ne i \\ \end{matrix} \right.}

【注意】所有这里区别就在于 当jij \ne i时,分子有一个导数直接为0。

Reference

交叉熵代价函数(作用及公式推导)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!