Implementation of a softmax activation function for neural networks

前端 未结 2 1662
失恋的感觉
失恋的感觉 2020-12-23 15:13

I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.

A naive implementati

相关标签:
2条回答
  • 2020-12-23 15:24

    I know it's already answered but I'll post here a step-by-step anyway.

    put on log:

    zj = wj . x + bj
    oj = exp(zj)/sum_i{ exp(zi) }
    log oj = zj - log sum_i{ exp(zi) }
    

    Let m be the max_i { zi } use the log-sum-exp trick:

    log oj = zj - log {sum_i { exp(zi + m - m)}}
       = zj - log {sum_i { exp(m) exp(zi - m) }},
       = zj - log {exp(m) sum_i {exp(zi - m)}}
       = zj - m - log {sum_i { exp(zi - m)}}
    

    the term exp(zi-m) can suffer underflow if m is much greater than other z_i, but that's ok since this means z_i is irrelevant on the softmax output after normalization. final results is:

    oj = exp (zj - m - log{sum_i{exp(zi-m)}})
    
    0 讨论(0)
  • 2020-12-23 15:40

    First go to log scale, i.e calculate log(y) instead of y. The log of the numerator is trivial. In order to calculate the log of the denominator, you can use the following 'trick': http://lingpipe-blog.com/2009/06/25/log-sum-of-exponentials/

    0 讨论(0)
提交回复
热议问题