The program runs 3 times slower when compiled with g++ 5.3.1 than the same program compiled with g++ 4.8.4, the same command

后端 未结 2 1050
栀梦
栀梦 2021-02-07 06:58

Recently, I\'ve started to use Ubuntu 16.04 with g++ 5.3.1 and checked that my program runs 3 times slower. Before that I\'ve used Ubuntu 14.04, g++ 4.8.4. I bu

相关标签:
2条回答
  • 2021-02-07 07:16

    This is a bug in glibc that affects versions 2.23 (in use in Ubuntu 16.04) and early versions of 2.24 (e.g. Fedora and Debian already include the patched versions that are no longer affected, Ubuntu 16.10 and 17.04 do not yet).

    The slowdown stems from the SSE to AVX register transition penalty. See the glibc bug report here: https://sourceware.org/bugzilla/show_bug.cgi?id=20495

    Oleg Strikov wrote up a quite extensive analysis in his Ubuntu bug report: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1663280

    Without the patch, there are various possible workarounds: you can compile your problem statically (i.e. add -static) or you can disable lazy binding by setting the environment variable LD_BIND_NOW during the program's execution. Again, more details in the above bug reports.

    0 讨论(0)
  • 2021-02-07 07:30

    For a really precise answer, you'll probably need a libm maintainer to look at your question. However, here is my take - take it as a draft, if I find something else I'll add it to this answer.

    First, look at the asm generated by GCC, between gcc 4.8.2 and gcc 5.3. There are only 4 differences:

    • at the beginning a xorpd gets transformed into a pxor, for the same registers
    • a pxor xmm1, xmm1 was added before the conversion from int to double (cvtsi2sd)
    • a movsd was moved just before the conversion
    • the addition (addsd) was moved just before a comparison (ucomisd)

    All of this is probably not sufficient for the decrease in performance. Having a fine profiler (intel for example) could allow to be more conclusive, but I don't have access to one.

    Now, there is a dependency on sin, so let's see what changed. And the problem is first identifying what platform you use... There are 17 different subfolders in glibc's sysdeps (where sin is defined), so I went for the x86_64 one.

    First, the way processor capabilities are handled changed, for example glibc/sysdeps/x86_64/fpu/multiarch/s_sin.c used to do the checking for FMA / AVX in 2.19, but in the 2.23 it is done externally. There could be a bug in which the capabilities are not properly reported, resulting in not using FMA or AVX. I however don't think this hypothesis as very plausible.

    Secondly, in .../x86_64/fpu/s_sinf.S, the only modifications (apart from a copyright update) change the stack offset, aligning it to 16 bytes; idem for sincos. Not sure it would make a huge difference.

    However, the 2.23 added a lot of sources for vectorized versions of math functions, and some use AVX512 - which your processor probably don't support because it is really new. Maybe libm tries to use such extensions, and since you don't have them, fallback on generic version ?

    EDIT: I tried compiling it with gcc 4.8.5, but for it I need to recompile glibc-2.19. For the moment I cannot link, because of this:

    /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libm.a(s_sin.o): in function « __cos »:
    (.text+0x3542): undefined reference to « _dl_x86_cpu_features »
    /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libm.a(s_sin.o): in function « __sin »:
    (.text+0x3572): undefined reference to « _dl_x86_cpu_features »
    

    I will try to resolve this, but beforehand notice that it is very probable that this symbol is responsible for choosing the right optimized version based on the processor, which may be part of the performance hit.

    0 讨论(0)
提交回复
热议问题