How to access the carry flag while adding two 64 bit numbers using asm in C

痴心易碎 提交于 2019-11-28 14:22:05

As usual, inline asm is not strictly necessary. https://gcc.gnu.org/wiki/DontUseInlineAsm. But currently compilers kinda suck for actual extended-precision addition, so you might want asm for this.

There's an Intel intrinsic for adc: _addcarry_u64. But gcc and clang may make slow code., unfortunately. In GNU C on a 64-bit platform, you could just use unsigned __int128.


Compilers usually manage to make pretty good code when checking for carry-out from addition using the idiom that carry_out = (x+y) < x, where < is an unsigned compare. For example:

struct long_carry { unsigned long res; unsigned carry; };

struct long_carry add_carryout(unsigned long x, unsigned long y) {
    unsigned long retval = x + y;
    unsigned carry = (retval < x);
    return (struct long_carry){ retval, carry };
}

gcc7.2 -O3 emits this (and clang emits similar code):

    mov     rax, rdi        # because we need return value in a different register
    xor     edx, edx        # set up for setc
    add     rax, rsi        # generate carry
    setc    dl              # save carry.
    ret                     # return with rax=sum, edx=carry  (SysV ABI struct packing)

There's no way you can do better than this with inline asm; this function already looks optimal for modern CPUs. (Well I guess if mov wasn't zero latency, doing the add first would shorten the latency to carry being ready. But on Intel CPUs, it's supposed to be better to overwrite mov-elimination results right away, so it's better to mov first and then add.)


Clang will even use adc to use the carry-out from an add as the carry-in to another add, but only for the first limb. Perhaps because: Update: this function is broken: carry_out = (x+y) < x doesn't work when there's carry-in. With carry_out = (x+y+c_in) < x, y+c_in can wrap to zero and give you (x+0) < x (false) even though there was carry.

Notice that clang's cmp/adc reg,0 exactly implements the behaviour of the C, which isn't the same as another adc there.

Anyway, gcc doesn't even use adc the first time, when it is safe. (So use unsigned __int128 for code that doesn't suck, and asm for integers even wider than that).

// BROKEN with carry_in=1 and y=~0U
static
unsigned adc_buggy(unsigned long *sum, unsigned long x, unsigned long y, unsigned carry_in) {
    *sum = x + y + carry_in;
    unsigned carry = (*sum < x);
    return carry;
}

// *x += *y
void add256(unsigned long *x, unsigned long *y) {
    unsigned carry;
    carry = adc(x, x[0], y[0], 0);
    carry = adc(x+1, x[1], y[1], carry);
    carry = adc(x+2, x[2], y[2], carry);
    carry = adc(x+3, x[3], y[3], carry);
}

    mov     rax, qword ptr [rsi]
    add     rax, qword ptr [rdi]
    mov     qword ptr [rdi], rax

    mov     rax, qword ptr [rdi + 8]
    mov     r8, qword ptr [rdi + 16]   # hoisted
    mov     rdx, qword ptr [rsi + 8]
    adc     rdx, rax                   # ok, no memory operand but still adc
    mov     qword ptr [rdi + 8], rdx

    mov     rcx, qword ptr [rsi + 16]   # r8 was loaded earlier
    add     rcx, r8
    cmp     rdx, rax                    # manually check the previous result for carry.  /facepalm
    adc     rcx, 0

    ...

This sucks, so if you want extended-precision addition, you still need asm. But for getting the carry-out into a C variable, you don't.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!