How to do unsigned saturating addition in C?

前端 未结 17 1725
孤独总比滥情好
孤独总比滥情好 2020-11-27 02:31

What is the best (cleanest, most efficient) way to write saturating addition in C?

The function or macro should add two unsigned inputs (need both 16- and 32-bit ver

相关标签:
17条回答
  • 2020-11-27 03:14

    Zero branch solution:

    uint32_t sadd32(uint32_t a, uint32_t b)
    {
        uint64_t s = (uint64_t)a+b;
        return -(s>>32) | (uint32_t)s;
    }
    

    A good compiler will optimize this to avoid doing any actual 64-bit arithmetic (s>>32 will merely be the carry flag, and -(s>>32) is the result of sbb %eax,%eax).

    In x86 asm (AT&T syntax, a and b in eax and ebx, result in eax):

    add %eax,%ebx
    sbb %eax,%eax
    or %ebx,%eax
    

    8- and 16-bit versions should be obvious. Signed version might require a bit more work.

    0 讨论(0)
  • 2020-11-27 03:15

    I'm not sure if this is faster than Skizz's solution (always profile), but here's an alternative no-branch assembly solution. Note that this requires the conditional move (CMOV) instruction, which I'm not sure is available on your target.

    
    uint32_t sadd32(uint32_t a, uint32_t b)
    {
        __asm
        {
            movl eax, a
            addl eax, b
            movl edx, 0xffffffff
            cmovc eax, edx
        }
    }
    
    0 讨论(0)
  • 2020-11-27 03:16

    You probably want portable C code here, which your compiler will turn into proper ARM assembly. ARM has conditional moves, and these can be conditional on overflow. The algorithm then becomes: add and conditionally set the destination to unsigned(-1), if overflow was detected.

    uint16_t add16(uint16_t a, uint16_t b)
    {
      uint16_t c = a + b;
      if (c < a)  /* Can only happen due to overflow */
        c = -1;
      return c;
    }
    

    Note that this differs from the other algorithms in that it corrects overflow, instead of relying on another calculation to detect overflow.

    x86-64 clang 3.7 -O3 output for adds32: significantly better than any other answer:

    add     edi, esi
    mov     eax, -1
    cmovae  eax, edi
    ret
    

    ARMv7: gcc 4.8 -O3 -mcpu=cortex-a15 -fverbose-asm output for adds32:

    adds    r0, r0, r1      @ c, a, b
    it      cs
    movcs   r0, #-1         @ conditional-move
    bx      lr
    

    16bit: still doesn't use ARM's unsigned-saturating add instruction (UADD16)

    add     r1, r1, r0        @ tmp114, a
    movw    r3, #65535      @ tmp116,
    uxth    r1, r1  @ c, tmp114
    cmp     r0, r1    @ a, c
    ite     ls        @
    movls   r0, r1        @,, c
    movhi   r0, r3        @,, tmp116
    bx      lr  @
    
    0 讨论(0)
  • 2020-11-27 03:17
    int saturating_add(int x, int y)
    {
        int w = sizeof(int) << 3;
        int msb = 1 << (w-1);
    
        int s = x + y;
        int sign_x = msb & x;
        int sign_y = msb & y;
        int sign_s = msb & s;
    
        int nflow = sign_x && sign_y && !sign_s;
        int pflow = !sign_x && !sign_y && sign_s;
    
        int nmask = (~!nflow + 1);
        int pmask = (~!pflow + 1);
    
        return (nmask & ((pmask & s) | (~pmask & ~msb))) | (~nmask & msb);
    }
    

    This implementation doesn't use control flows, campare operators(==, !=) and the ?: operator. It just uses bitwise operators and logical operators.

    0 讨论(0)
  • 2020-11-27 03:20

    I suppose, the best way for x86 is to use inline assembler to check overflow flag after addition. Something like:

    add eax, ebx
    jno @@1
    or eax, 0FFFFFFFFh
    @@1:
    .......
    

    It's not very portable, but IMHO the most efficient way.

    0 讨论(0)
提交回复
热议问题