How to implement the Softmax derivative independently from any loss function?

前端 未结 4 1646
夕颜
夕颜 2021-02-05 16:04

For a neural networks library I implemented some activation functions and loss functions and their derivatives. They can be combined arbitrarily and the derivative at the output

4条回答
  •  深忆病人
    2021-02-05 16:44

    Mathematically, the derivative of Softmax σ(j) with respect to the logit Zi (for example, Wi*X) is

    where the red delta is a Kronecker delta.

    If you implement iteratively:

    def softmax_grad(s):
        # input s is softmax value of the original input x. Its shape is (1,n) 
        # i.e.  s = np.array([0.3,0.7]),  x = np.array([0,1])
    
        # make the matrix whose size is n^2.
        jacobian_m = np.diag(s)
    
        for i in range(len(jacobian_m)):
            for j in range(len(jacobian_m)):
                if i == j:
                    jacobian_m[i][j] = s[i] * (1 - s[i])
                else: 
                    jacobian_m[i][j] = -s[i] * s[j]
        return jacobian_m
    

    Test:

    In [95]: x
    Out[95]: array([1, 2])
    
    In [96]: softmax(x)
    Out[96]: array([ 0.26894142,  0.73105858])
    
    In [97]: softmax_grad(softmax(x))
    Out[97]: 
    array([[ 0.19661193, -0.19661193],
           [-0.19661193,  0.19661193]])
    

    If you implement in a vectorized version:

    soft_max = softmax(x)    
    
    # reshape softmax to 2d so np.dot gives matrix multiplication
    
    def softmax_grad(softmax):
        s = softmax.reshape(-1,1)
        return np.diagflat(s) - np.dot(s, s.T)
    
    softmax_grad(soft_max)
    
    #array([[ 0.19661193, -0.19661193],
    #       [-0.19661193,  0.19661193]])
    

提交回复
热议问题