What is the best (cleanest, most efficient) way to write saturating addition in C?
The function or macro should add two unsigned inputs (need both 16- and 32-bit ver
Zero branch solution:
uint32_t sadd32(uint32_t a, uint32_t b)
{
uint64_t s = (uint64_t)a+b;
return -(s>>32) | (uint32_t)s;
}
A good compiler will optimize this to avoid doing any actual 64-bit arithmetic (s>>32
will merely be the carry flag, and -(s>>32)
is the result of sbb %eax,%eax
).
In x86 asm (AT&T syntax, a
and b
in eax
and ebx
, result in eax
):
add %eax,%ebx
sbb %eax,%eax
or %ebx,%eax
8- and 16-bit versions should be obvious. Signed version might require a bit more work.
I'm not sure if this is faster than Skizz's solution (always profile), but here's an alternative no-branch assembly solution. Note that this requires the conditional move (CMOV) instruction, which I'm not sure is available on your target.
uint32_t sadd32(uint32_t a, uint32_t b)
{
__asm
{
movl eax, a
addl eax, b
movl edx, 0xffffffff
cmovc eax, edx
}
}
You probably want portable C code here, which your compiler will turn into proper ARM assembly. ARM has conditional moves, and these can be conditional on overflow. The algorithm then becomes: add and conditionally set the destination to unsigned(-1), if overflow was detected.
uint16_t add16(uint16_t a, uint16_t b)
{
uint16_t c = a + b;
if (c < a) /* Can only happen due to overflow */
c = -1;
return c;
}
Note that this differs from the other algorithms in that it corrects overflow, instead of relying on another calculation to detect overflow.
x86-64 clang 3.7 -O3 output for adds32: significantly better than any other answer:
add edi, esi
mov eax, -1
cmovae eax, edi
ret
ARMv7: gcc 4.8 -O3 -mcpu=cortex-a15 -fverbose-asm output for adds32:
adds r0, r0, r1 @ c, a, b
it cs
movcs r0, #-1 @ conditional-move
bx lr
16bit: still doesn't use ARM's unsigned-saturating add instruction (UADD16
)
add r1, r1, r0 @ tmp114, a
movw r3, #65535 @ tmp116,
uxth r1, r1 @ c, tmp114
cmp r0, r1 @ a, c
ite ls @
movls r0, r1 @,, c
movhi r0, r3 @,, tmp116
bx lr @
int saturating_add(int x, int y)
{
int w = sizeof(int) << 3;
int msb = 1 << (w-1);
int s = x + y;
int sign_x = msb & x;
int sign_y = msb & y;
int sign_s = msb & s;
int nflow = sign_x && sign_y && !sign_s;
int pflow = !sign_x && !sign_y && sign_s;
int nmask = (~!nflow + 1);
int pmask = (~!pflow + 1);
return (nmask & ((pmask & s) | (~pmask & ~msb))) | (~nmask & msb);
}
This implementation doesn't use control flows, campare operators(==
, !=
) and the ?:
operator. It just uses bitwise operators and logical operators.
I suppose, the best way for x86 is to use inline assembler to check overflow flag after addition. Something like:
add eax, ebx
jno @@1
or eax, 0FFFFFFFFh
@@1:
.......
It's not very portable, but IMHO the most efficient way.