I know this is a recurring question, but I haven\'t really found a useful answer yet. I\'m basically looking for a fast approximation of the function acos
in C+
As Jonas Wielicki mentions in the comments, there isn't much precision trade-offs you can make.
Your best bet is to try and use the processor intrinsics for the functions (if your compiler doesn't do this already) and using some math to reduce the amount of calculations necessary.
Also very important is to keep everything in a CPU-friendly format, make sure there are few cache misses, etc.
If you are calculating large amounts of functions like acos
perhaps moving to the GPU is an option for you?