Derivative of a softmax function explanation

前端未结

关注

 2  1445

I am trying to compute the derivative of the activation function for softmax. I found this : https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function

相关标签:

2条回答

暗喜

2021-01-31 12:55

For what it's worth, here is my derivation based on SirGuy answer: (Feel free to point errors if you find any).

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2021-01-31 13:02
The derivative of a sum is the sum of the derivatives, ie:
```
    d(f1 + f2 + f3 + f4)/dx = df1/dx + df2/dx + df3/dx + df4/dx
```
To derive the derivatives of p_j with respect to o_i we start with:
```
    d_i(p_j) = d_i(exp(o_j) / Sum_k(exp(o_k)))
```
I decided to use d_i for the derivative with respect to o_i to make this easier to read. Using the product rule we get:
```
     d_i(exp(o_j)) / Sum_k(exp(o_k)) + exp(o_j) * d_i(1/Sum_k(exp(o_k)))
```
Looking at the first term, the derivative will be 0 if i != j, this can be represented with a delta function which I will call D_ij. This gives (for the first term):
```
    = D_ij * exp(o_j) / Sum_k(exp(o_k))
```
Which is just our original function multiplied by D_ij
```
    = D_ij * p_j
```
For the second term, when we derive each element of the sum individually, the only non-zero term will be when i = k, this gives us (not forgetting the power rule because the sum is in the denominator)
```
    = -exp(o_j) * Sum_k(d_i(exp(o_k)) / Sum_k(exp(o_k))^2
    = -exp(o_j) * exp(o_i) / Sum_k(exp(o_k))^2
    = -(exp(o_j) / Sum_k(exp(o_k))) * (exp(o_j) / Sum_k(exp(o_k)))
    = -p_j * p_i
```
Putting the two together we get the surprisingly simple formula:
```
    D_ij * p_j - p_j * p_i
```
If you really want we can split it into i = j and i != j cases:
```
    i = j: D_ii * p_i - p_i * p_i = p_i - p_i * p_i = p_i * (1 - p_i)

    i != j: D_ij * p_i - p_i * p_j = -p_i * p_j
```
Which is our answer.
0 讨论(0)
发布评论:

提交评论
- 加载中...