SIMD vectorize atan2 using ARM NEON assembly

僤鯓⒐⒋嵵緔 提交于 2019-12-01 18:51:00
artless noise

See math-neon for an existing single valued float implementation. As it has no (or little) conditionals, it should translate well to an SIMD implementation.

As the ARM NEON doesn't have an instruction to calculate this directly, then there are various techniques to create an approximation that are better than a Taylor series. Specifically, the min-max approach gives a good polynomial candidate for approximation. min-max refers to minimizing the maximum error; with a Chebyshev approximation usually being very good.

DSP guru has specifics on different methods for function approximation. There are also numerous books on-line. You can search for optimum polynomials using matlab, octave or some other tool-kit. Typically, you need to bound this with a range and precision. Once you have a good algorithm for a single value, extending it to SIMD of any sort should be trivial.

The question calculate atan2 has a reference to Apple's atan.c source. The co-efficients in code are most likely derived from what I have given above. The issue with this code is it does not scale to SIMD well as the atan() approximation is piece-wise and you need different co-efficients depending on the range. For your SIMD, you will need the same co-efficients (multipliers, divisors, equation) through-out the range.

Abramowitz and Stegun: Handbook of Mathematical Functions has a chapter on circular functions with section 4.4.28 giving an logarithmic formulae. This seems to be the similar to the eglibc implementation.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!