I\'ve been profiling an application all day long and, having optimized a couple bits of code, I\'m left with this on my todo list. It\'s the activation function for a neural
There are a lot of good answers here. I would suggest running it through this technique, just to make sure
BTW the function you have is the inverse logit function,
or the inverse of the log-odds-ratio function log(f/(1-f))
.
(Updated with performance measurements)(Updated again with real results :)
I think a lookup table solution would get you very far when it comes to performance, at a negligible memory and precision cost.
The following snippet is an example implementation in C (I don't speak c# fluently enough to dry-code it). It runs and performs well enough, but I'm sure there's a bug in it :)
#include <math.h>
#include <stdio.h>
#include <time.h>
#define SCALE 320.0f
#define RESOLUTION 2047
#define MIN -RESOLUTION / SCALE
#define MAX RESOLUTION / SCALE
static float sigmoid_lut[RESOLUTION + 1];
void init_sigmoid_lut(void) {
int i;
for (i = 0; i < RESOLUTION + 1; i++) {
sigmoid_lut[i] = (1.0 / (1.0 + exp(-i / SCALE)));
}
}
static float sigmoid1(const float value) {
return (1.0f / (1.0f + expf(-value)));
}
static float sigmoid2(const float value) {
if (value <= MIN) return 0.0f;
if (value >= MAX) return 1.0f;
if (value >= 0) return sigmoid_lut[(int)(value * SCALE + 0.5f)];
return 1.0f-sigmoid_lut[(int)(-value * SCALE + 0.5f)];
}
float test_error() {
float x;
float emax = 0.0;
for (x = -10.0f; x < 10.0f; x+=0.00001f) {
float v0 = sigmoid1(x);
float v1 = sigmoid2(x);
float error = fabsf(v1 - v0);
if (error > emax) { emax = error; }
}
return emax;
}
int sigmoid1_perf() {
clock_t t0, t1;
int i;
float x, y = 0.0f;
t0 = clock();
for (i = 0; i < 10; i++) {
for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
y = sigmoid1(x);
}
}
t1 = clock();
printf("", y); /* To avoid sigmoidX() calls being optimized away */
return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
}
int sigmoid2_perf() {
clock_t t0, t1;
int i;
float x, y = 0.0f;
t0 = clock();
for (i = 0; i < 10; i++) {
for (x = -5.0f; x <= 5.0f; x+=0.00001f) {
y = sigmoid2(x);
}
}
t1 = clock();
printf("", y); /* To avoid sigmoidX() calls being optimized away */
return (t1 - t0) / (CLOCKS_PER_SEC / 1000);
}
int main(void) {
init_sigmoid_lut();
printf("Max deviation is %0.6f\n", test_error());
printf("10^7 iterations using sigmoid1: %d ms\n", sigmoid1_perf());
printf("10^7 iterations using sigmoid2: %d ms\n", sigmoid2_perf());
return 0;
}
Previous results were due to the optimizer doing its job and optimized away the calculations. Making it actually execute the code yields slightly different and much more interesting results (on my way slow MB Air):
$ gcc -O2 test.c -o test && ./test
Max deviation is 0.001664
10^7 iterations using sigmoid1: 571 ms
10^7 iterations using sigmoid2: 113 ms
TODO:
There are things to improve and ways to remove weaknesses; how to do is is left as an exercise to the reader :)
Have a look at this post. it has an approximation for e^x written in Java, this should be the C# code for it (untested):
public static double Exp(double val) {
long tmp = (long) (1512775 * val + 1072632447);
return BitConverter.Int64BitsToDouble(tmp << 32);
}
In my benchmarks this is more than 5 times faster than Math.exp() (in Java). The approximation is based on the paper "A Fast, Compact Approximation of the Exponential Function" which was developed exactly to be used in neural nets. It is basically the same as a lookup table of 2048 entries and linear approximation between the entries, but all this with IEEE floating point tricks.
EDIT: According to Special Sauce this is ~3.25x faster than the CLR implementation. Thanks!
First thought: How about some stats on the values variable?
If not, you can probably get a boost by testing for out of bounds values
if(value < -10) return 0;
if(value > 10) return 1;
If so, you can probably get some benefit from Memoization (probably not, but it doesn't hurt to check....)
if(sigmoidCache.containsKey(value)) return sigmoidCache.get(value);
If neither of these can be applied, then as some others have suggested, maybe you can get away with lowering the accuracy of your sigmoid...
Try:
public static float Sigmoid(double value) {
return 1.0f / (1.0f + (float) Math.Exp(-value));
}
EDIT: I did a quick benchmark. On my machine, the above code is about 43% faster than your method, and this mathematically-equivalent code is the teeniest bit faster (46% faster than the original):
public static float Sigmoid(double value) {
float k = Math.Exp(value);
return k / (1.0f + k);
}
EDIT 2: I'm not sure how much overhead C# functions have, but if you #include <math.h>
in your source code, you should be able to use this, which uses a float-exp function. It might be a little faster.
public static float Sigmoid(double value) {
float k = expf((float) value);
return k / (1.0f + k);
}
Also if you're doing millions of calls, the function-calling overhead might be a problem. Try making an inline function and see if that's any help.
Idea: Perhaps you can make a (large) lookup table with the values pre-calculated?