How to implement the Softmax derivative independently from any loss function?

前端未结

关注

 4  1646

夕颜 2021-02-05 16:04

For a neural networks library I implemented some activation functions and loss functions and their derivatives. They can be combined arbitrarily and the derivative at the output

4条回答

深忆病人 (楼主)

2021-02-05 16:44

Mathematically, the derivative of Softmax σ(j) with respect to the logit Zi (for example, Wi*X) is

where the red delta is a Kronecker delta.

If you implement iteratively:

def softmax_grad(s):
    # input s is softmax value of the original input x. Its shape is (1,n) 
    # i.e.  s = np.array([0.3,0.7]),  x = np.array([0,1])

    # make the matrix whose size is n^2.
    jacobian_m = np.diag(s)

    for i in range(len(jacobian_m)):
        for j in range(len(jacobian_m)):
            if i == j:
                jacobian_m[i][j] = s[i] * (1 - s[i])
            else: 
                jacobian_m[i][j] = -s[i] * s[j]
    return jacobian_m

Test:

In [95]: x
Out[95]: array([1, 2])

In [96]: softmax(x)
Out[96]: array([ 0.26894142,  0.73105858])

In [97]: softmax_grad(softmax(x))
Out[97]: 
array([[ 0.19661193, -0.19661193],
       [-0.19661193,  0.19661193]])

If you implement in a vectorized version:

soft_max = softmax(x)    

# reshape softmax to 2d so np.dot gives matrix multiplication

def softmax_grad(softmax):
    s = softmax.reshape(-1,1)
    return np.diagflat(s) - np.dot(s, s.T)

softmax_grad(soft_max)

#array([[ 0.19661193, -0.19661193],
#       [-0.19661193,  0.19661193]])

0 讨论(0)

查看其它4个回答