Math optimization in C#

后端未结

关注

 25  2192

I\'ve been profiling an application all day long and, having optimized a couple bits of code, I\'m left with this on my todo list. It\'s the activation function for a neural

相关标签:

25条回答

再見小時候

2020-12-07 10:59
There are a lot of good answers here. I would suggest running it through this technique, just to make sure
- You're not calling it any more times than you need to.
  (Sometimes functions get called more than necessary, just because they are so easy to call.)
- You're not calling it repeatedly with the same arguments
  (where you could use memoization)
BTW the function you have is the inverse logit function,
or the inverse of the log-odds-ratio function log(f/(1-f)).
0 讨论(0)
发布评论:

提交评论
- 加载中...

南方客

2020-12-07 11:00

(Updated with performance measurements)(Updated again with real results :)

I think a lookup table solution would get you very far when it comes to performance, at a negligible memory and precision cost.

The following snippet is an example implementation in C (I don't speak c# fluently enough to dry-code it). It runs and performs well enough, but I'm sure there's a bug in it :)

#include <math.h>
#include <stdio.h>
#include <time.h>

#define SCALE 320.0f
#define RESOLUTION 2047
#define MIN -RESOLUTION / SCALE
#define MAX RESOLUTION / SCALE

static float sigmoid_lut[RESOLUTION + 1];

void init_sigmoid_lut(void) {
    int i;    
    for (i = 0; i < RESOLUTION + 1; i++) {
        sigmoid_lut[i] =  (1.0 / (1.0 + exp(-i / SCALE)));
    }
}

static float sigmoid1(const float value) {
    return (1.0f / (1.0f + expf(-value)));
}

static float sigmoid2(const float value) {
    if (value <= MIN) return 0.0f;
    if (value >= MAX) return 1.0f;
    if (value >= 0) return sigmoid_lut[(int)(value * SCALE + 0.5f)];
    return 1.0f-sigmoid_lut[(int)(-value * SCALE + 0.5f)];
}

float test_error() {
    float x;
    float emax = 0.0;

    for (x = -10.0f; x < 10.0f; x+=0.00001f) {
        float v0 = sigmoid1(x);
        float v1 = sigmoid2(x);
        float error = fabsf(v1 - v0);
        if (error > emax) { emax = error; }
    } 
    return emax;
}

int sigmoid1_perf() {
    clock_t t0, t1;
    int i;
    float x, y = 0.0f;

    t0 = clock();
    for (i = 0; i < 10; i++) {
        for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
            y = sigmoid1(x);
        }
    }
    t1 = clock();
    printf("", y); /* To avoid sigmoidX() calls being optimized away */
    return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
}

int sigmoid2_perf() {
    clock_t t0, t1;
    int i;
    float x, y = 0.0f;
    t0 = clock();
    for (i = 0; i < 10; i++) {
        for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
            y = sigmoid2(x);
        }
    }
    t1 = clock();
    printf("", y); /* To avoid sigmoidX() calls being optimized away */
    return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
}

int main(void) {
    init_sigmoid_lut();
    printf("Max deviation is %0.6f\n", test_error());
    printf("10^7 iterations using sigmoid1: %d ms\n", sigmoid1_perf());
    printf("10^7 iterations using sigmoid2: %d ms\n", sigmoid2_perf());

    return 0;
}

Previous results were due to the optimizer doing its job and optimized away the calculations. Making it actually execute the code yields slightly different and much more interesting results (on my way slow MB Air):

$ gcc -O2 test.c -o test && ./test
Max deviation is 0.001664
10^7 iterations using sigmoid1: 571 ms
10^7 iterations using sigmoid2: 113 ms

profile

TODO:

There are things to improve and ways to remove weaknesses; how to do is is left as an exercise to the reader :)

Tune the range of the function to avoid the jump where the table starts and ends.
Add a slight noise function to hide the aliasing artifacts.
As Rex said, interpolation could get you quite a bit further precision-wise while being rather cheap performance-wise.

0 讨论(0)

我寻月下人不归

2020-12-07 11:01
Have a look at this post. it has an approximation for e^x written in Java, this should be the C# code for it (untested):
```
public static double Exp(double val) {  
    long tmp = (long) (1512775 * val + 1072632447);  
    return BitConverter.Int64BitsToDouble(tmp << 32);  
}
```
In my benchmarks this is more than 5 times faster than Math.exp() (in Java). The approximation is based on the paper "A Fast, Compact Approximation of the Exponential Function" which was developed exactly to be used in neural nets. It is basically the same as a lookup table of 2048 entries and linear approximation between the entries, but all this with IEEE floating point tricks.

EDIT: According to Special Sauce this is ~3.25x faster than the CLR implementation. Thanks!
0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2020-12-07 11:02
First thought: How about some stats on the values variable?
- Are the values of "value" typically small -10 <= value <= 10?
If not, you can probably get a boost by testing for out of bounds values
```
if(value < -10)  return 0;
if(value > 10)  return 1;
```
- Are the values repeated often?
If so, you can probably get some benefit from Memoization (probably not, but it doesn't hurt to check....)
```
if(sigmoidCache.containsKey(value)) return sigmoidCache.get(value);
```
If neither of these can be applied, then as some others have suggested, maybe you can get away with lowering the accuracy of your sigmoid...
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-07 11:03
Try:
```
public static float Sigmoid(double value) {
    return 1.0f / (1.0f + (float) Math.Exp(-value));
}
```
EDIT: I did a quick benchmark. On my machine, the above code is about 43% faster than your method, and this mathematically-equivalent code is the teeniest bit faster (46% faster than the original):
```
public static float Sigmoid(double value) {
    float k = Math.Exp(value);
    return k / (1.0f + k);
}
```
EDIT 2: I'm not sure how much overhead C# functions have, but if you #include <math.h> in your source code, you should be able to use this, which uses a float-exp function. It might be a little faster.
```
public static float Sigmoid(double value) {
    float k = expf((float) value);
    return k / (1.0f + k);
}
```
Also if you're doing millions of calls, the function-calling overhead might be a problem. Try making an inline function and see if that's any help.
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-07 11:03

Idea: Perhaps you can make a (large) lookup table with the values pre-calculated?

0 讨论(0)
发布评论:

提交评论
- 加载中...