gcc optimizes code when I pass it the -O2
flag, but I\'m wondering how well it can actually do that if I compile all source files to object files and then link them
It seems that you have rediscovered on your own the issue about the separate compilation model that C and C++ use. While it certainly eases memory requirements (which was important at the time of its creation), it does so by exposing only minimal information to the compiler, meaning that some optimizations (like this one) cannot be performed.
Newer languages, with their module systems can expose as much information as necessary, and we can hope to rip those benefits if modules get into the next version of C++...
In the mean time, the simplest thing to go for is called Link-Time Optimization. The idea is that you will perform as much optimization as possible on each TU (Translation Unit) to obtain an object file, but you will also enrich the traditional object file (which contain assembly) with IR (Intermediate Representation, used by compilers to optimize) for part of or all functions.
When the linker will be invoked to merge those object files together, instead of just merging the files together, it will merge the IR representations, rexeecute a number of optimization passes (constant propagation, inlining, ...) and then create assembly on its own. It means that instead of being just a linker, it is in fact a backend optimizer.
Of course, like all optimization passes this has a cost, so makes for longer compilation. Also, it means that both the compiler and the linker should be passed a special option to trigger this behavior, in the case of gcc, it would be -lto
or -O4
.