I am looking for for a fast-SSE-low-precision (~1e-3) exponential function.
Something like this should do the job. You need to tune the 1.05
constant to get a lower maximal error -- I'm too lazy to do that:
__m128d fastexp(const __m128d &x)
__m128d scaled = _mm_add_pd(_mm_mul_pd(x, _mm_set1_pd(1.0/std::log(2.0)) ), _mm_set1_pd(3*1024.0-1.05));
return _mm_castsi128_pd(_mm_slli_epi64(_mm_castpd_si128(scaled), 11));
This just gets about 2.5% relative precision -- for better precision you may need to add a second term.
Also, for values which overflow or underflow this will result in unspecified values, you can avoid this by clamping the scaled
value to some values.