Recently, I\'ve started to use Ubuntu 16.04 with g++ 5.3.1 and checked that my program runs 3 times slower. Before that I\'ve used Ubuntu 14.04, g++ 4.8.4. I bu
This is a bug in glibc that affects versions 2.23 (in use in Ubuntu 16.04) and early versions of 2.24 (e.g. Fedora and Debian already include the patched versions that are no longer affected, Ubuntu 16.10 and 17.04 do not yet).
The slowdown stems from the SSE to AVX register transition penalty. See the glibc bug report here: https://sourceware.org/bugzilla/show_bug.cgi?id=20495
Oleg Strikov wrote up a quite extensive analysis in his Ubuntu bug report: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1663280
Without the patch, there are various possible workarounds: you can compile your problem statically (i.e. add -static
) or you can disable lazy binding by setting the environment variable LD_BIND_NOW
during the program's execution. Again, more details in the above bug reports.
For a really precise answer, you'll probably need a libm maintainer to look at your question. However, here is my take - take it as a draft, if I find something else I'll add it to this answer.
First, look at the asm generated by GCC, between gcc 4.8.2 and gcc 5.3. There are only 4 differences:
xorpd
gets transformed into a pxor
, for the same registerspxor xmm1, xmm1
was added before the conversion from int to double (cvtsi2sd
)movsd
was moved just before the conversionaddsd
) was moved just before a comparison (ucomisd
)All of this is probably not sufficient for the decrease in performance. Having a fine profiler (intel for example) could allow to be more conclusive, but I don't have access to one.
Now, there is a dependency on sin
, so let's see what changed. And the problem is first identifying what platform you use... There are 17 different subfolders in glibc's sysdeps
(where sin is defined), so I went for the x86_64
one.
First, the way processor capabilities are handled changed, for example glibc/sysdeps/x86_64/fpu/multiarch/s_sin.c
used to do the checking for FMA / AVX in 2.19, but in the 2.23 it is done externally. There could be a bug in which the capabilities are not properly reported, resulting in not using FMA or AVX. I however don't think this hypothesis as very plausible.
Secondly, in .../x86_64/fpu/s_sinf.S
, the only modifications (apart from a copyright update) change the stack offset, aligning it to 16 bytes; idem for sincos. Not sure it would make a huge difference.
However, the 2.23 added a lot of sources for vectorized versions of math functions, and some use AVX512 - which your processor probably don't support because it is really new. Maybe libm tries to use such extensions, and since you don't have them, fallback on generic version ?
EDIT: I tried compiling it with gcc 4.8.5, but for it I need to recompile glibc-2.19. For the moment I cannot link, because of this:
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libm.a(s_sin.o): in function « __cos »:
(.text+0x3542): undefined reference to « _dl_x86_cpu_features »
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libm.a(s_sin.o): in function « __sin »:
(.text+0x3572): undefined reference to « _dl_x86_cpu_features »
I will try to resolve this, but beforehand notice that it is very probable that this symbol is responsible for choosing the right optimized version based on the processor, which may be part of the performance hit.