64bit/32bit division faster algorithm for ARM / NEON?

前端 未结 1 1822
误落风尘
误落风尘 2020-12-16 23:40

I am working on a code in which at two places there are 64bit by 32 bit fixed point division and the result is taken in 32 bits. These two places are together taking more

相关标签:
1条回答
  • 2020-12-17 00:14

    I did a lot of fixed-point arithmetic in the past and did a lot of research looking for fast 64/32 bit divisions myself. If you google for 'ARM division' you will find tons of great links and discussion about this issue.

    The best solution for ARM architecture, where even a 32 bit division may not be available in hardware is here:

    http://www.peter-teichmann.de/adiv2e.html

    This assembly code is very old, and your assembler may not understand the syntax of it. It is however worth porting the code to your toolchain. It is the fastest division code for your special case I've seen so far, and trust me: I've benchmarked them all :-)

    Last time I did that (about 5 years ago, for CortexA8) this code was about 10 times faster than what the compiler generated.

    This code doesn't use NEON. A NEON port would be interesting. Not sure if it will improve the performance much though.

    Edit:

    I found the code with assembler ported to GAS (GNU Toolchain). This code is working and tested:

    Divide.S

    .section ".text"
    
    .global udiv64
    
    udiv64:
        adds      r0,r0,r0
        adc       r1,r1,r1
    
        .rept 31
            cmp     r1,r2   
            subcs   r1,r1,r2  
            adcs    r0,r0,r0
            adc     r1,r1,r1
        .endr
    
        cmp     r1,r2
        subcs   r1,r1,r2
        adcs    r0,r0,r0
    
        bx      lr
    

    C-Code:

    extern "C" uint32_t udiv64 (uint32_t a, uint32_t b, uint32_t c);
    
    int32_t fixdiv24 (int32_t a, int32_t b)
    /* calculate (a<<24)/b with 64 bit immediate result */
    {
      int q;
      int sign = (a^b) < 0; /* different signs */
      uint32_t l,h;
      a = a<0 ? -a:a;
      b = b<0 ? -b:b;
      l = (a << 24);
      h = (a >> 8);
      q = udiv64 (l,h,b);
      if (sign) q = -q;
      return q;
    }
    
    0 讨论(0)
提交回复
热议问题