I\'m distributing a C++ program with a makefile for the Unix version, and I\'m wondering what compiler options I should use to get the fastest possible code (it falls into t
Consider using -fomit-frame-pointer
unless you need to debug with gdb (yuck). That will give the compiler one more register to use for variables (otherwise this register is wasted for useless frame pointers).
Also you may use something like -march=core2
or more generally -march=native
to enable the compiler to use newer instructions and further tune the code for the specified architecture, but for this you must be sure your code will not be expected to run on older processors.
There is no 'fastcall' on x86-64 - both Win64 and Linux ABI define register-based calling ("fastcall") as the only calling convention (though Linux uses more registers).
Please try -oFast instead of -o3
Also here is a list of flags you might want to selectively enable.
-ffloat-store
-fexcess-precision=style
-ffast-math
-fno-rounding-math
-fno-signaling-nans
-fcx-limited-range
-fno-math-errno
-funsafe-math-optimizations
-fassociative-math
-freciprocal-math
-ffinite-math-only
-fno-signed-zeros
-fno-trapping-math
-frounding-math
-fsingle-precision-constant
-fcx-fortran-rules
A complete list of the flags and their detailed description is available here
You should certainly, apart from what others have already suggested, try -flto
. It enables link time optimization which, in some cases, can really do magic.
For further information see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
gcc -O3 is not guaranteed to be the fastest. -O2 is often a better starting point. After that, profile guided optimization and trying out specific options: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
It's a long read, but probably worth it.
Note that a "Link Time Code Generation" (MSVC) aka "Link Time Optimization" is available in gcc 4.5+
By the way, there is no specific "fastcall" calling convention for Win64. There is only "the" calling convention: http://msdn.microsoft.com/en-us/magazine/cc300794.aspx
I would try profile guided optimization:
-fprofile-generate
Enable options usually used for instrumenting application to produce profile useful for later recompilation with profile feedback based optimization. You must use-fprofile-generate
both when compiling and when linking your program. The following options are enabled:-fprofile-arcs
,-fprofile-values
,-fvpt
.
You should also give the compiler hints about the architecture on which the program will run.
For example if it will only run on a server and you can compile it on the same machine as the server, you can just use -march=native
.
Otherwise you need to determine which features your users will all have and pass the corresponding parameter to GCC.
(Apparently you're targeting 64-bit, so GCC will probably already include more optimizations than for generic x86.)