Short version: I\'d like to know whether there are implementations of the standard trigonometric functions that are faster than the ones included in math.h
.
This should be pretty damn fast if you can optimize it further please do and post the code on like pastie.org or something.
Computer Specifications -> 512MB Ram , Visual Studio 2010 , Windows XP Professional SP3 Version 2002 , Intel (R) Pentium (R) 4 CPU 2.8GHZ.
This is insanely accurate and will actually provide slightly better results in some situations. E.g. 90, 180, 270 degrees in C++ returns a non 0 decimal.
FULL TABLE OF 0 through 359 Degrees: https://pastee.org/dhwbj
FORMAT -> DEGREE # -> MINE_X(#) , CosX(#) , MINE_Z(#) , SinZ(#).
Below is the code used to construct the above shown table. You can probably make it even more accurate if you use a larger data type. I utilized an unsigned short and did N/64000. So What ever the cos(##) and sin(##) where closest to I rounded to that index. I also tried to use as little extra data as possible so this wouldn't be some cluttered table with 720 float values for cos and sin. Which would probably give better results, but be a complete waste of memory. The table below is as small as I could make it. I'd like to see if it's possible to make an equation that could round to all these short values and use that instead. I'm not sure if it would be any faster, but it would eliminate the table completely and probably not reduce speed by anything or much.
So the accuracy in comparison to the C++ cos/sin operations is 99.99998% through 100%.
Below is the table used to calculate the cos/sin values.
static const unsigned __int16 DEGREE_LOOKUP_TABLE[91] =
{
64000, 63990, 63961, 63912, 63844, 63756,
63649, 63523, 63377, 63212, 63028, 62824,
62601, 62360, 62099, 61819, 61521, 61204,
60868, 60513, 60140, 59749, 59340, 58912,
58467, 58004, 57523, 57024, 56509, 55976,
55426, 54859, 54275, 53675, 53058, 52426,
51777, 51113, 50433, 49737, 49027, 48301,
47561, 46807, 46038, 45255, 44458, 43648,
42824, 41988, 41138, 40277, 39402, 38516,
37618, 36709, 35788, 34857, 33915, 32962,
32000, 31028, 30046, 29055, 28056, 27048,
26031, 25007, 23975, 22936, 21889, 20836,
19777, 18712, 17641, 16564, 15483, 14397,
13306, 12212, 11113, 10012, 8907, 7800,
6690, 5578, 4464, 3350, 2234, 1117,
0,
};
Below is the actual code that does the cos/sin calculations.
int deg1 = (int)degrees;
int deg2 = 90 - deg1;
float module = degrees - deg1;
double vX = DEGREE_LOOKUP_TABLE[deg1] * 0.000015625;
double vZ = DEGREE_LOOKUP_TABLE[deg2] * 0.000015625;
double mX = DEGREE_LOOKUP_TABLE[deg1 + 1] * 0.000015625;
double mZ = DEGREE_LOOKUP_TABLE[deg2 - 1] * 0.000015625;
float vectorX = vX + (mX - vX) * module;
float vectorZ = vZ + (mZ - vZ) * module;
if (quadrant & 1)
{
float tmp = vectorX;
if (quadrant == 1)
{
vectorX = -vectorZ;
vectorZ = tmp;
} else {
vectorX = vectorZ;
vectorZ = -tmp;
}
} else if (quadrant == 2) {
vectorX = -vectorX;
vectorZ = -vectorZ;
}
SPEEDS BELOW using the originally mention computer specifications. I was running it in debug mode before this is debug mode, but is ran through the executable which I believe is debug without debugging.
MY METHOD
1,000 Iterations -> 0.004641 MS or 4641 NanoSeconds.
100,000 Iterations -> 4.4328 MS.
100,000,000 Iterations -> 454.079 MS.
1,000,000,000 Iterations -> 4065.19 MS.
COS/SIN METHOD
1,000 Iterations -> 0.581016 MS or 581016 NanoSeconds.
100,000 Iterations -> 25.0049 MS.
100,000,000 Iterations -> 24,731.6 MS.
1,000,000,000 Iterations -> 246,096 MS.
So to summarize the above performing both cos(###) and sin(###) with my strategy allows roughly 220,000,000 executions per second. Utilizing the computer specifications shown originally. This is fairly quick and utilizes very little memory so it's a great substitute to math cos/sin functions normally found in C++. If you'd like to see the accuracy open the link shown above and there is a print out of degrees 0 trough 359. Also this supports 0 through 89 and quadrants 0 through 3. So you'd need to either use that or perform (DEGREES % 90).
A) Trying to save small percents will not be very satisfying. Finishing in 97 instead of 100 hours is still a long time.
B) You say you profiled, and that the trig functions take more time than you would like. How much? and what about all the remaining time? It's quite possible you have bigger fish to fry. Most profilers based on the gprof concepts do not tell you about mid-stack calls that you could focus on to save larger amounts of time. Here's an example.
Quake 3's source has some code for precomputed sine/cos aimed at speed over precision, its not sse based that thus quite portable(both on architecture and intrinsic api). You might also find this summary of sse and sse2 based functions very interesting: http://gruntthepeon.free.fr/ssemath/
If you want to use a custom implementation, look here, here and here
Also here (scroll to Universal SIMD-Mathlibrary) if you need to calculate sin/cos for large arrays
You can also try to use the C++ SSE intrinsics. Look here
Note that most modern compilers support SSE and SSE2 optimizations. For Visual Studio 2010, for example, you'll need to manually enable it. Once you do this, a different implementation will be used for most standard math functions.
One more option is to use DirectX HLSL. Look here. Note that there is a nice sincos functions which return both sin and cos.
Usually, I use IPP (which is not free). For details, look here
Here are some good slides on how to do power series approximations (NOT Taylor series though) of trig functions: Faster Math Functions.
It's geared towards game programmers, which means accuracy gets sacrificed for performance, but you should be able to add another term or two to the approximations to get some of the accuracy back.
The nice thing about this is that you should also be able to extend it to SIMD easily, so that you could compute the sin or cos of 4 values at one (2 if you're using double precision).
Hope that helps...
Long time ago on slow machines people used an arrays with precomputed values. another option to calculate with your own precision like this: (look for "Series definitions")