Yeah thanks that works. @PeterCordes. Also __int128
works. But one more thing as you said using the intrinsics of multiprecision arithmetic that is _addcarry_u64
in C, using the header file immintrin.h
I have the following code
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <immintrin.h>
unsigned char _addcarry_u64(unsigned char c_in, uint64_t src1, uint64_t src2,uint64_t *sum);
int main()
{
unsigned char carry;
uint64_t sum;
long long int c1=0,c2=0;
uint64_t a=0x0234BDFA12CD4379,b=0xA8DB4567ACE92B38;
carry = _addcarry_u64(0,a,b,&sum);
printf("sum is %lx and carry value is %u n",sum,carry);
return 0;
}
Can you please point me out the error? I'm getting undefined reference to _addcarry_u64
. Some quick google doesn't answer the problem , if any other header file to be used or it is not compatible with gcc and why so
Initially I had this code for adding two 64 bit numbers:
static __inline int is_digit_lessthan_ct(digit_t x, digit_t y)
{ // Is x < y?
return ( int)((x ^ ((x ^ y) | ((x - y) ^ y))) >> (RADIX-1));
}
#define ADDC(carryIn, addend1, addend2, carryOut, sumOut) \
{ digit_t tempReg = (addend1) + (int)(carryIn); \
(sumOut) = (addend2) + tempReg; \
(carryOut) = (is_digit_lessthan_ct(tempReg, (int)(carryIn)) | is_digit_lessthan_ct((sumOut), tempReg)); \
}
Now I got to know that the speed of this implementation can be improved using assembly language. So I am trying to do something similar however I cannot access or return the carry. Here is my code :
#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>
uint64_t add32(uint64_t a,uint64_t b)
{
uint64_t d=0,carry=0;
__asm__("mov %1,%%rax\n\t"
"adc %2,%%rax\n\t"
"mov %%rax,%0\n\t"
:"=r"(d)
:"r"(a),"r"(b)
:"%rax"
);
return d;
}
int main()
{
uint64_t a=0xA234BDFA12CD4379,b=0xA8DB4567ACE92B38;
printf("Sum = %lx \n",add32(a,b));
return 0;
}
The result of this addition should be 14B100361BFB66EB1, where the initial 1 in msb is the carry. I want to save that carry in another register. I tried jc, but I'm getting some or the other error. Even setc gave me error, may be because I'm not sure of the syntax. So can anyone tell me how to save the carry in another register or return it by modifying this code?
As usual, inline asm is not strictly necessary. https://gcc.gnu.org/wiki/DontUseInlineAsm. But currently compilers kinda suck for actual extended-precision addition, so you might want asm for this.
There's an Intel intrinsic for adc
: _addcarry_u64
. But gcc and clang may make slow code., unfortunately. In GNU C on a 64-bit platform, you could just use unsigned __int128
.
Compilers usually manage to make pretty good code when checking for carry-out from addition using the idiom that carry_out = (x+y) < x
, where <
is an unsigned compare. For example:
struct long_carry { unsigned long res; unsigned carry; };
struct long_carry add_carryout(unsigned long x, unsigned long y) {
unsigned long retval = x + y;
unsigned carry = (retval < x);
return (struct long_carry){ retval, carry };
}
gcc7.2 -O3 emits this (and clang emits similar code):
mov rax, rdi # because we need return value in a different register
xor edx, edx # set up for setc
add rax, rsi # generate carry
setc dl # save carry.
ret # return with rax=sum, edx=carry (SysV ABI struct packing)
There's no way you can do better than this with inline asm; this function already looks optimal for modern CPUs. (Well I guess if mov
wasn't zero latency, doing the add
first would shorten the latency to carry being ready. But on Intel CPUs, it's supposed to be better to overwrite mov-elimination results right away, so it's better to mov first and then add.)
Clang will even use adc
to use the carry-out from an add as the carry-in to another add, but only for the first limb. Perhaps because: Update: this function is broken: carry_out = (x+y) < x
doesn't work when there's carry-in. With carry_out = (x+y+c_in) < x
, y+c_in
can wrap to zero and give you (x+0) < x
(false) even though there was carry.
Notice that clang's cmp
/adc reg,0
exactly implements the behaviour of the C, which isn't the same as another adc
there.
Anyway, gcc doesn't even use adc
the first time, when it is safe. (So use unsigned __int128
for code that doesn't suck, and asm for integers even wider than that).
// BROKEN with carry_in=1 and y=~0U
static
unsigned adc_buggy(unsigned long *sum, unsigned long x, unsigned long y, unsigned carry_in) {
*sum = x + y + carry_in;
unsigned carry = (*sum < x);
return carry;
}
// *x += *y
void add256(unsigned long *x, unsigned long *y) {
unsigned carry;
carry = adc(x, x[0], y[0], 0);
carry = adc(x+1, x[1], y[1], carry);
carry = adc(x+2, x[2], y[2], carry);
carry = adc(x+3, x[3], y[3], carry);
}
mov rax, qword ptr [rsi]
add rax, qword ptr [rdi]
mov qword ptr [rdi], rax
mov rax, qword ptr [rdi + 8]
mov r8, qword ptr [rdi + 16] # hoisted
mov rdx, qword ptr [rsi + 8]
adc rdx, rax # ok, no memory operand but still adc
mov qword ptr [rdi + 8], rdx
mov rcx, qword ptr [rsi + 16] # r8 was loaded earlier
add rcx, r8
cmp rdx, rax # manually check the previous result for carry. /facepalm
adc rcx, 0
...
This sucks, so if you want extended-precision addition, you still need asm. But for getting the carry-out into a C variable, you don't.
来源:https://stackoverflow.com/questions/46701364/how-to-access-the-carry-flag-while-adding-two-64-bit-numbers-using-asm-in-c