Math optimization in C#

后端 未结 25 2193
悲&欢浪女
悲&欢浪女 2020-12-07 10:25

I\'ve been profiling an application all day long and, having optimized a couple bits of code, I\'m left with this on my todo list. It\'s the activation function for a neural

相关标签:
25条回答
  • 2020-12-07 11:12

    I realize that it has been a year since this question popped up, but I ran across it because of the discussion of F# and C performance relative to C#. I played with some of the samples from other responders and discovered that delegates appear to execute faster than a regular method invocation but there is no apparent peformance advantage to F# over C#.

    • C: 166ms
    • C# (delegate): 275ms
    • C# (method): 431ms
    • C# (method, float counter): 2,656ms
    • F#: 404ms

    The C# with a float counter was a straight port of the C code. It is much faster to use an int in the for loop.

    0 讨论(0)
  • 2020-12-07 11:12

    Doing a Google search, I found an alternative implementation of the Sigmoid function.

    public double Sigmoid(double x)
    {
       return 2 / (1 + Math.Exp(-2 * x)) - 1;
    }
    

    Is that correct for your needs? Is it faster?

    http://dynamicnotions.blogspot.com/2008/09/sigmoid-function-in-c.html

    0 讨论(0)
  • 2020-12-07 11:14
    1. Remember, that any changes in this activation function come at cost of different behavior. This even includes switching to float (and thus lowering the precision) or using activation substitutes. Only experimenting with your use case will show the right way.
    2. In addition to the simple code optimizations, I would also recommend to consider parallelization of the computations (i.e.: to leverage multiple cores of your machine or even machines at the Windows Azure Clouds) and improving the training algorithms.

    UPDATE: Post on lookup tables for ANN activation functions

    UPDATE2: I removed the point on LUTs since I've confused these with the complete hashing. Thanks go to Henrik Gustafsson for putting me back on the track. So the memory is not an issue, although the search space still gets messed up with local extrema a bit.

    0 讨论(0)
  • 2020-12-07 11:14

    I've seen that a lot of people around here are trying to use approximation to make Sigmoid faster. However, it is important to know that Sigmoid can also be expressed using tanh, not only exp. Calculating Sigmoid this way is around 5 times faster than with exponential, and by using this method you are not approximating anything, thus the original behaviour of Sigmoid is kept as-is.

        public static double Sigmoid(double value)
        {
            return 0.5d + 0.5d * Math.Tanh(value/2);
        }
    

    Of course, parellization would be the next step to performance improvement, but as far as the raw calculation is concerned, using Math.Tanh is faster than Math.Exp.

    0 讨论(0)
  • 2020-12-07 11:16

    Remember, Sigmoid constraints results to range between 0 and 1. Values of smaller than about -10 return a value very, very close to 0.0. Values of greater than about 10 return a value very, very close to 1.

    Back in the old days when computers couldn't handle arithmetic overflow/underflow that well, putting if conditions to limit the calculation was usual. If I were really concerned about its performance (or basically Math's performance), I would change your code to the old fashioned way (and mind the limits) so that it does not call Math unnecessarily:

    public double Sigmoid(double value)
    {
        if (value < -45.0) return 0.0;
        if (value > 45.0) return 1.0;
        return 1.0 / (1.0 + Math.Exp(-value));
    }
    

    I realize anyone reading this answer may be involved in some sort of NN development. Be mindful of how the above affects the other aspects of your training scores.

    0 讨论(0)
  • 2020-12-07 11:18

    F# Has Better Performance than C# in .NET math algorithms. So rewriting neural network in F# might improve the overall performance.

    If we re-implement LUT benchmarking snippet (I've been using slightly tweaked version) in F#, then the resulting code:

    • executes sigmoid1 benchmark in 588.8ms instead of 3899,2ms
    • executes sigmoid2 (LUT) benchmark in 156.6ms instead of 411.4 ms

    More details could be found in the blog post. Here's the F# snippet JIC:

    #light
    
    let Scale = 320.0f;
    let Resolution = 2047;
    
    let Min = -single(Resolution)/Scale;
    let Max = single(Resolution)/Scale;
    
    let range step a b =
      let count = int((b-a)/step);
      seq { for i in 0 .. count -> single(i)*step + a };
    
    let lut = [| 
      for x in 0 .. Resolution ->
        single(1.0/(1.0 +  exp(-double(x)/double(Scale))))
      |]
    
    let sigmoid1 value = 1.0f/(1.0f + exp(-value));
    
    let sigmoid2 v = 
      if (v <= Min) then 0.0f;
      elif (v>= Max) then 1.0f;
      else
        let f = v * Scale;
        if (v>0.0f) then lut.[int (f + 0.5f)]
        else 1.0f - lut.[int(0.5f - f)];
    
    let getError f = 
      let test = range 0.00001f -10.0f 10.0f;
      let errors = seq { 
        for v in test -> 
          abs(sigmoid1(single(v)) - f(single(v)))
      }
      Seq.max errors;
    
    open System.Diagnostics;
    
    let test f = 
      let sw = Stopwatch.StartNew(); 
      let mutable m = 0.0f;
      let result = 
        for t in 1 .. 10 do
          for x in 1 .. 1000000 do
            m <- f(single(x)/100000.0f-5.0f);
      sw.Elapsed.TotalMilliseconds;
    
    printf "Max deviation is %f\n" (getError sigmoid2)
    printf "10^7 iterations using sigmoid1: %f ms\n" (test sigmoid1)
    printf "10^7 iterations using sigmoid2: %f ms\n" (test sigmoid2)
    
    let c = System.Console.ReadKey(true);
    

    And the output (Release compilation against F# 1.9.6.2 CTP with no debugger):

    Max deviation is 0.001664
    10^7 iterations using sigmoid1: 588.843700 ms
    10^7 iterations using sigmoid2: 156.626700 ms
    

    UPDATE: updated benchmarking to use 10^7 iterations to make results comparable with C

    UPDATE2: here are the performance results of the C implementation from the same machine to compare with:

    Max deviation is 0.001664
    10^7 iterations using sigmoid1: 628 ms
    10^7 iterations using sigmoid2: 157 ms
    
    0 讨论(0)
提交回复
热议问题