How to implement the Softmax derivative independently from any loss function?

前端 未结 4 1653
夕颜
夕颜 2021-02-05 16:04

For a neural networks library I implemented some activation functions and loss functions and their derivatives. They can be combined arbitrarily and the derivative at the output

4条回答
  •  佛祖请我去吃肉
    2021-02-05 16:54

    Here is a c++ vectorized version, using intrinsics ( 22 times (!) faster than the non-SSE version):

    // How many floats fit into __m256 "group".
    // Used by vectors and matrices, to ensure their dimensions are appropriate for 
    // intrinsics.
    // Otherwise, consecutive rows of matrices will not be 16-byte aligned, and 
    // operations on them will be incorrect.
    #define F_MULTIPLE_OF_M256 8
    
    
    //check to quickly see if your rows are divisible by m256.
    //you can 'undefine' to save performance, after everything was verified to be correct.
    #define ASSERT_THE_M256_MULTIPLES
    #ifdef ASSERT_THE_M256_MULTIPLES
        #define assert_is_m256_multiple(x)  assert( (x%F_MULTIPLE_OF_M256) == 0)
    #else
        #define assert_is_m256_multiple (q) 
    #endif
    
    
    // usually used at the end of our Reduce functions,
    // where the final __m256 mSum needs to be collapsed into 1 scalar.
    static inline float slow_hAdd_ps(__m256 x){
        const float *sumStart = reinterpret_cast(&x);
        float sum = 0.0f;
    
        for(size_t i=0; i

    If for some reason somebody wants a simple (non-SSE) version, here it is:

    inline static void SoftmaxGrad_fromResult_nonSSE(const float* softmaxResult,  
                                                     const float *gradFromAbove,  //<--gradient vector, flowing into us from the above layer
                                                     float *gradOutput,  
                                                     size_t count ){
        // every pre-softmax element in a layer contributed to the softmax of every other element
        // (it went into the denominator). So gradient will be distributed from every post-softmax element to every pre-elem.
        for(size_t i=0; i

提交回复
热议问题