Math optimization in C#

后端 未结 25 2192
悲&欢浪女
悲&欢浪女 2020-12-07 10:25

I\'ve been profiling an application all day long and, having optimized a couple bits of code, I\'m left with this on my todo list. It\'s the activation function for a neural

相关标签:
25条回答
  • 2020-12-07 10:59

    There are a lot of good answers here. I would suggest running it through this technique, just to make sure

    • You're not calling it any more times than you need to.
      (Sometimes functions get called more than necessary, just because they are so easy to call.)
    • You're not calling it repeatedly with the same arguments
      (where you could use memoization)

    BTW the function you have is the inverse logit function,
    or the inverse of the log-odds-ratio function log(f/(1-f)).

    0 讨论(0)
  • 2020-12-07 11:00

    (Updated with performance measurements)(Updated again with real results :)

    I think a lookup table solution would get you very far when it comes to performance, at a negligible memory and precision cost.

    The following snippet is an example implementation in C (I don't speak c# fluently enough to dry-code it). It runs and performs well enough, but I'm sure there's a bug in it :)

    #include <math.h>
    #include <stdio.h>
    #include <time.h>
    
    #define SCALE 320.0f
    #define RESOLUTION 2047
    #define MIN -RESOLUTION / SCALE
    #define MAX RESOLUTION / SCALE
    
    static float sigmoid_lut[RESOLUTION + 1];
    
    void init_sigmoid_lut(void) {
        int i;    
        for (i = 0; i < RESOLUTION + 1; i++) {
            sigmoid_lut[i] =  (1.0 / (1.0 + exp(-i / SCALE)));
        }
    }
    
    static float sigmoid1(const float value) {
        return (1.0f / (1.0f + expf(-value)));
    }
    
    static float sigmoid2(const float value) {
        if (value <= MIN) return 0.0f;
        if (value >= MAX) return 1.0f;
        if (value >= 0) return sigmoid_lut[(int)(value * SCALE + 0.5f)];
        return 1.0f-sigmoid_lut[(int)(-value * SCALE + 0.5f)];
    }
    
    float test_error() {
        float x;
        float emax = 0.0;
    
        for (x = -10.0f; x < 10.0f; x+=0.00001f) {
            float v0 = sigmoid1(x);
            float v1 = sigmoid2(x);
            float error = fabsf(v1 - v0);
            if (error > emax) { emax = error; }
        } 
        return emax;
    }
    
    int sigmoid1_perf() {
        clock_t t0, t1;
        int i;
        float x, y = 0.0f;
    
        t0 = clock();
        for (i = 0; i < 10; i++) {
            for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
                y = sigmoid1(x);
            }
        }
        t1 = clock();
        printf("", y); /* To avoid sigmoidX() calls being optimized away */
        return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
    }
    
    int sigmoid2_perf() {
        clock_t t0, t1;
        int i;
        float x, y = 0.0f;
        t0 = clock();
        for (i = 0; i < 10; i++) {
            for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
                y = sigmoid2(x);
            }
        }
        t1 = clock();
        printf("", y); /* To avoid sigmoidX() calls being optimized away */
        return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
    }
    
    int main(void) {
        init_sigmoid_lut();
        printf("Max deviation is %0.6f\n", test_error());
        printf("10^7 iterations using sigmoid1: %d ms\n", sigmoid1_perf());
        printf("10^7 iterations using sigmoid2: %d ms\n", sigmoid2_perf());
    
        return 0;
    }
    

    Previous results were due to the optimizer doing its job and optimized away the calculations. Making it actually execute the code yields slightly different and much more interesting results (on my way slow MB Air):

    $ gcc -O2 test.c -o test && ./test
    Max deviation is 0.001664
    10^7 iterations using sigmoid1: 571 ms
    10^7 iterations using sigmoid2: 113 ms
    

    profile


    TODO:

    There are things to improve and ways to remove weaknesses; how to do is is left as an exercise to the reader :)

    • Tune the range of the function to avoid the jump where the table starts and ends.
    • Add a slight noise function to hide the aliasing artifacts.
    • As Rex said, interpolation could get you quite a bit further precision-wise while being rather cheap performance-wise.
    0 讨论(0)
  • 2020-12-07 11:01

    Have a look at this post. it has an approximation for e^x written in Java, this should be the C# code for it (untested):

    public static double Exp(double val) {  
        long tmp = (long) (1512775 * val + 1072632447);  
        return BitConverter.Int64BitsToDouble(tmp << 32);  
    }
    

    In my benchmarks this is more than 5 times faster than Math.exp() (in Java). The approximation is based on the paper "A Fast, Compact Approximation of the Exponential Function" which was developed exactly to be used in neural nets. It is basically the same as a lookup table of 2048 entries and linear approximation between the entries, but all this with IEEE floating point tricks.

    EDIT: According to Special Sauce this is ~3.25x faster than the CLR implementation. Thanks!

    0 讨论(0)
  • 2020-12-07 11:02

    First thought: How about some stats on the values variable?

    • Are the values of "value" typically small -10 <= value <= 10?

    If not, you can probably get a boost by testing for out of bounds values

    if(value < -10)  return 0;
    if(value > 10)  return 1;
    
    • Are the values repeated often?

    If so, you can probably get some benefit from Memoization (probably not, but it doesn't hurt to check....)

    if(sigmoidCache.containsKey(value)) return sigmoidCache.get(value);
    

    If neither of these can be applied, then as some others have suggested, maybe you can get away with lowering the accuracy of your sigmoid...

    0 讨论(0)
  • 2020-12-07 11:03

    Try:

    public static float Sigmoid(double value) {
        return 1.0f / (1.0f + (float) Math.Exp(-value));
    }
    

    EDIT: I did a quick benchmark. On my machine, the above code is about 43% faster than your method, and this mathematically-equivalent code is the teeniest bit faster (46% faster than the original):

    public static float Sigmoid(double value) {
        float k = Math.Exp(value);
        return k / (1.0f + k);
    }
    

    EDIT 2: I'm not sure how much overhead C# functions have, but if you #include <math.h> in your source code, you should be able to use this, which uses a float-exp function. It might be a little faster.

    public static float Sigmoid(double value) {
        float k = expf((float) value);
        return k / (1.0f + k);
    }
    

    Also if you're doing millions of calls, the function-calling overhead might be a problem. Try making an inline function and see if that's any help.

    0 讨论(0)
  • 2020-12-07 11:03

    Idea: Perhaps you can make a (large) lookup table with the values pre-calculated?

    0 讨论(0)
提交回复
热议问题