问题
I stumbled upon very strange issue with GCC. The issue is 25% drop in performance. Here is the story.
I have a pice of software which is fp32 compute intensive (neural networks compiled with TVM). I compile it for ARM (rk3399 device), here is info:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/5/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-armhf/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-armhf --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-armhf --with-arch-directory=arm --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-multilib --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)
uname -a
Linux FriendlyELEC 4.4.143 #1 SMP Tue Nov 20 11:10:11 CST 2018 aarch64 aarch64 aarch64 GNU/Linux
lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Thread(s) per core: 1
Core(s) per socket: 3
Socket(s): 2
Model name: ARMv8 Processor rev 2 (v8l)
CPU max MHz: 1800.0000
CPU min MHz: 408.0000
Hypervisor vendor: horizontal
Virtualization type: full
The code was initially "slow" and cpp11, I decided to try cpp17 and cpp14. cpp17 was not supported, but cpp14 was. I switched to cpp14 and voila I got boost around 25% in performance. I really tested it to make sure the boost is in fact real and not a measuring mistake. I had this boost for a week then my device rebooted and the boost in performance was gone!
It may sound crazy, but I'm very sure in my code and measurements I had. I didn't have explicit compile flags prior to this gimmick. Now I'm trying to figure out compile flags for GCC to reclaim what was lost, but I don't have much experience with GCC. What could be the issue here? What flags can affect performance that much?
the code uses .so files, compiled with use of llvm and gcc
llvm -device=arm_cpu -target=armv8l-linux-gnueabihf -mattr=+neon,fp-armv8
回答1:
"What flags can affect performance that much?"
Turning on the optimizer -O1
, -O2
or -O3
can have a dramatic effect (default is unoptimized build -O0
).
Enabling link time optimization -flto
can also often give significant improvements.
See also the manual for more.
回答2:
It's not GCC fault. It's CPU frequency scaling problem. I had device with ARM with Linux (ubuntu) on board, strange behavior and different benchmarking results are due to strange cpu frequency governing by OS.
来源:https://stackoverflow.com/questions/60208814/gcc-arm-performance-drop