Is it really efficient to use Karatsuba algorithm in 64-bit x 64-bit multiplication?

前端未结

关注

 3  658

春和景丽 2021-01-19 01:48

I work on AVX2 and need to calculate 64-bit x64-bit -> 128-bit widening multiplication and got 64-bit high part in the fastest manner. Since AVX2 has not such an instruction

3条回答

滥情空心 (楼主)

2021-01-19 02:25
It's highly unlikely that AVX2 will beat the mulx instruction which does 64bx64b to 128b in one instruction. There is one exception I'm aware of large multiplications using floating point FFT.

However, if you don't need exactly 64bx64b to 128b you could consider 53bx53b to 106b using double-double arithmetic.

To multiply four 53-bit numbers a and b to get four 106-bit number only needs two instructions:
```
__m256 p = _mm256_mul_pd(a,b);
__m256 e = _mm256_fmsub_pd(a,b,p);
```
This gives four 106-bit numbers in two instructions compared to one 128-bit number in one instruction using mulx.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...