问题
I have a C program which uses GCC's __uint128_t
which is great, but now my needs have grown beyond it.
What are my options for fast arithmetic with 196 or 256 bits?
The only operation I need is addition (and I don't need the carry bit, i.e., I will be working mod 2192 or 2256).
Speed is important, so I don't want to move to a general multi-precision if at all possible. (In fact my code does use multi-precision in some places, but this is in the critical loop and will run tens of billions of times. So far the multi-precision needs to run only tens of thousands of times.)
Maybe this is simple enough to code directly, or maybe I need to find some appropriate library.
What is your advice, Oh great Stack Overflow?
Clarification: GMP is too slow for my needs. Although I actually use multi-precision in my code it's not in the inner loop and runs less than 105 times. The hot loop runs more like 1012 times. When I changed my code (increasing a size parameter) so that the multi-precision part ran more often vs. the single-precision, I had a 100-fold slowdown (mostly due to memory management issues, I think, rather than extra µops). I'd like to get that down to a 4-fold slowdown or better.
回答1:
256-bit version
__uint128_t a[2], b[2], c[2]; // c = a + b
c[0] = a[0] + b[0];
c[1] = a[1] + b[1] + (c[0] < a[0]);
If you use it many times in a loop you should consider make it parallel by SIMD and multithreading
Edit: 192-bit version. This way you can eliminate the 128-bit comparison like what @harold's stated:
struct __uint192_t {
__uint128_t H;
__uint64_t L;
} a, b, c; // c = a + b
c.L = a.L + b.L;
c.H = a.H + b.H + (c.L < a.L);
回答2:
You could test if the "add (low < oldlow)
to simulate carry"-technique from this answer is fast enough. It's slightly complicated by the fact that low
is an __uint128_t
here, that could hurt code generation. You might try it with 4 uint64_t
's as well, I don't know whether that'll be better or worse.
If that's not good enough, drop to inline assembly, and directly use the carry flag - it doesn't get any better than that, but you'd have the usual downsides of using inline assembly.
来源:https://stackoverflow.com/questions/22126073/multiword-addition-in-c