gcc 4.8 AVX optimization bug: extra code insertion?

前端未结

关注

 2  1851

It is great that gcc compiler 4.8 comes with AVX optimization with -Ofast option. However, I found an interesting but stupid bug, that it adds additional computations which are

相关标签:

2条回答

孤城傲影

2021-01-23 04:59

I think what you are seeing in the generated code is an additional iteration of Newton-Raphson to refine the initial estimate provided by vrcpps. (See: the Intel Intrinsics Guide for details of the accuracy of the initial estimate provided by vrcpps.)

0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2021-01-23 05:18

I have figured out why. All AVX/SIMD/SSE approximation instructions need at least one Newton-Rhapson iteration to restore accuracy, otherwise, it loses 50% accuracy, i.e., the original FLOAT32 has an accuracy up to 23-bits. Without any Newton-Rhapson, we are left with only 11-bits accuracy. That approximation is way too rough to be directly usable.

0 讨论(0)
发布评论:

提交评论
- 加载中...