icpc slower than gcc?

。_饼干妹妹 提交于 2019-12-23 03:44:10

问题


I'm trying to make an optimized parallel version of opencv SURF and in particular surf.cpp using Intel C++ compiler.

I'm using Intel Advisor to locate inefficient and unvectorized loops. In particular, it suggests to rebuild the code using the icpc compiler (instead of gcc) and then to use the xCORE-AVX2 flag since it's available for my hardware.

So my original cmake for building opencv using g++ was:

cmake -D CMAKE_BUILD_TYPE=RelWithDebInfo -D CMAKE_INSTALL_PREFIX=... -D OPENCV_EXTRA_MODULES_PATH=... -DWITH_TBB=OFF -DWITH_OPENMP=ON

And built the application which uses SURF with g++ ... -O3 -g -fopenmp

Using icpc instead is:

cmake -D CMAKE_BUILD_TYPE=RelWithDebInfo -D CMAKE_INSTALL_PREFIX=... -D OPENCV_EXTRA_MODULES_PATH=... -DWITH_TBB=OFF -DWITH_OPENMP=ON -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DCMAKE_CXX_FLAGS="-debug inline-debug-info -parallel-source-info=2 -ipo -parallel -xCORE-AVX2 -Bdynamic"

(in particular notice -DCMAKE_C_COMPILER -DCMAKE_CXX_COMPILER -DCMAKE_CXX_FLAGS)

And compiled the SURF application with: -g -O3 -ipo -parallel -qopenmp -xCORE-AVX2 and -shared-intel -parallel for linking

I thought that the icpc solution was going to be faster than the g++ one, but it isn't: icpc takes 0.15s while g++ takes 0.12s (I ran the experiments several times and these numbers are reliable).

Why this happens? Am I doing something wrong with icpc?

g++ OpenCV compiling options (partially generated by cmake):

-fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security   -Wstrict-prototypes  -Winit-self -Wpointer-arith  -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -msse -msse2 -mno-avx -msse3 -mno-ssse3 -mno-sse4.1 -mno-sse4.2 -ffunction-sections -fvisibility=hidden -Wno-unused-but-set-variable -Wno-missing-prototypes -Wno-missing-declarations -Wno-undef -Wno-unused -Wno-sign-compare -Wno-cast-align -Wno-shadow -Wno-maybe-uninitialized -Wno-pointer-to-int-cast -Wno-int-to-pointer-cast -Wno-unused-parameter -fPIC -O2 -g -DNDEBUG 

icpc OpenCV compiling options (partially generated by cmake):

-fsigned-char -fp-model precise -Wno-implicit-function-declaration -Wno-uninitialized -Wno-missing-prototypes -Wno-unused-but-set-parameter -Wno-missing-declarations -Wno-unused -Wno-shadow -Wno-sign-compare -Wno-unused-parameter -fPIC -O2 -g -DNDEBUG

There is thing that I noticed: the icpc flags that I specified are not included. Theoretically the following command in cmake:

-DCMAKE_CXX_FLAGS="-debug inline-debug-info -parallel-source-info=2 -ipo -parallel -xCORE-AVX2 -Bdynamic"

Should add all these flags during make but running VERBOSE=1 it shows only the flags that I posted in icpc OpenCV compiling options (partially generated by cmake). This is weird also because the cmake execution is finished, this is one of the lines of the report:

--     C++ Compiler:                /opt/intel/compilers_and_libraries_2017.1.132/linux/bin/intel64/icpc  (ver 17.0.1.20161005)
--     C++ flags (Release):         -debug inline-debug-info -parallel-source-info=2 -ipo -parallel -xCORE-AVX2 -Bdynamic   -fsigned-char -fp-model precise -qopenmp -O3 -DNDEBUG 
--     C++ flags (Debug):           -debug inline-debug-info -parallel-source-info=2 -ipo -parallel -xCORE-AVX2 -Bdynamic   -fsigned-char -fp-model precise -qopenmp -g 
--     C Compiler:                  /opt/intel/compilers_and_libraries_2017.1.132/linux/bin/intel64/icc
--     C flags (Release):           -fsigned-char -fp-model precise -qopenmp -O3 -DNDEBUG 
--     C flags (Debug):             -fsigned-char -fp-model precise -qopenmp -g 

As you can see, the optimization flags that I included in DCMAKE_CXX_FLAGS appears in C++ (Debug/Release) but they don't when I run make VERBOSE=1 and I don't know why.

Btw, from my knowledge, icpc should produce always faster code than g++ (if they use the same options, as in this case). Why this happens?

来源:https://stackoverflow.com/questions/42238022/icpc-slower-than-gcc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!