Calling fsincos instruction in LLVM slower than calling libc sin/cos functions?

前端 未结 1 756
傲寒
傲寒 2021-02-08 04:43

I am working on a language that is compiled with LLVM. Just for fun, I wanted to do some microbenchmarks. In one, I run some million sin / cos computations in a loop. In pseudoc

1条回答
  •  青春惊慌失措
    2021-02-08 05:00

    Hardware trig is slow.

    Too many documents claim that x87 instructions like fsin or fsincos are the fastest way to do trigonometric functions. Those claims are often wrong.

    The fastest way depends on your CPU. As CPUs become faster, old hardware trig instructions like fsin have not kept pace. With some CPUs, a software function, using a polynomial approximation for sine or another trig function, is now faster than a hardware instruction.

    In short, fsincos is too slow.

    Hardware trig is obsolete.

    There is enough evidence that the x86-64 platform has moved away from hardware trig.

    • amd64 prefers SSE over x87 for floats. Yet, SSE has no equivalents for x87 instructions like fsin.
    • For amd64, libm in both FreeBSD and glibc implement sin() and such functions in software, not with x87 trig. glibc has optimized x86-64 assembly for sinf() (the single-precision sine) with a polynomial approximation, not with x87 fsin. NetBSD and OpenBSD made the opposite choice: their libm for amd64 does use x87 instructions.
    • Steel Bank Common Lisp uses fsin in its x86 backend but not in its x86-64 backend. For x86-64, SBCL compiles code that calls sin() in libm.

    Hardware trig loses the race.

    I timed hardware and software sine on an AMD Phenom II X2 560 (3.3 GHz) from 2010. I wrote a C program with this loop:

    volatile double a, s;
    /* ... */
    for (i = 0; i < 100000000; i++)
            s = sin(a);
    

    I compiled this program twice, with two different implementations of sin(). The hard sin() uses x87 fsin. The soft sin() uses a polynomial approximation. My C compiler, gcc -O2, did not replace my sin() call with an inline fsin.

    Here are results for sin(0.5):

    $ time race-hard 0.5
        0m3.40s real     0m3.40s user     0m0.00s system
    $ time race-soft 0.5
        0m1.13s real     0m1.15s user     0m0.00s system
    

    Here soft sin(0.5) is so fast, this CPU would do soft sin(0.5) and soft cos(0.5) faster than one x87 fsin.

    And for sin(123):

    $ time race-hard 123
        0m3.61s real     0m3.62s user     0m0.00s system
    $ time race-soft 123
        0m3.08s real     0m3.07s user     0m0.01s system
    

    Soft sin(123) is slower than soft sin(0.5) because 123 is too large for the polynomial, so the function must subtract some multiple of 2π. If I also want cos(123), there is a chance that x87 fsincos would be faster than soft sin(123) and soft cos(123), for this CPU from 2010.

    0 讨论(0)
提交回复
热议问题