How to implement the Softmax derivative independently from any loss function?

前端未结

关注

 4  1645

夕颜 2021-02-05 16:04

For a neural networks library I implemented some activation functions and loss functions and their derivatives. They can be combined arbitrarily and the derivative at the output

4条回答

别跟我提以往 (楼主)

2021-02-05 16:33
It should be like this: (x is the input to the softmax layer and dy is the delta coming from the loss above it)
```
    dx = y * dy
    s = dx.sum(axis=dx.ndim - 1, keepdims=True)
    dx -= y * s

    return dx
```
But the way you compute the error should be:
```
    yact = activation.compute(x)
    ycost = cost.compute(yact)
    dsoftmax = activation.delta(x, cost.delta(yact, ycost, ytrue)) 
```
Explanation: Because the delta function is a part of the backpropagation algorithm, its responsibility is to multiply the vector dy (in my code, outgoing in your case) by the Jacobian of the compute(x) function evaluated at x. If you work out what does this Jacobian look like for softmax [1], and then multiply it from the left by a vector dy, after a bit of algebra you'll find out that you get something that corresponds to my Python code.

[1] https://stats.stackexchange.com/questions/79454/softmax-layer-in-a-neural-network
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...