gcc optimizes code when I pass it the -O2
flag, but I\'m wondering how well it can actually do that if I compile all source files to object files and then link them
Since you're using GCC, you can use the C99 inline
function specifier mechanism. This is from ISO/IEC 9899:1999.
§ 6.7.4 Function specifiers
Syntax
¶1 function-specifier:
inline
Constraints
¶2 Function specifiers shall be used only in the declaration of an identifier for a function.
¶3 An inline definition of a function with external linkage shall not contain a definition of a modifiable object with static storage duration, and shall not contain a reference to an identifier with internal linkage.
¶4 In a hosted environment, the
inline
function specifier shall not appear in a declaration ofmain
.Semantics
¶5 A function declared with an
inline
function specifier is an inline function. The function specifier may appear more than once; the behavior is the same as if it appeared only once. Making a function an inline function suggests that calls to the function be as fast as possible.118) The extent to which such suggestions are effective is implementation-defined.119)¶6 Any function with internal linkage can be an inline function. For a function with external linkage, the following restrictions apply: If a function is declared with an
inline
function specifier, then it shall also be defined in the same translation unit. If all of the file scope declarations for a function in a translation unit include theinline
function specifier withoutextern
, then the definition in that translation unit is an inline definition. An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.120)¶7 EXAMPLE The declaration of an inline function with external linkage can result in either an external definition, or a definition available for use only within the translation unit. A file scope declaration with
extern
creates an external definition. The following example shows an entire translation unit.inline double fahr(double t) { return (9.0 * t) / 5.0 + 32.0; } inline double cels(double t) { return (5.0 * (t - 32.0)) / 9.0; } extern double fahr(double); // creates an external definition double convert(int is_fahr, double temp) { /* A translator may perform inline substitutions */ return is_fahr ? cels(temp) : fahr(temp); }
¶8 Note that the definition of
fahr
is an external definition becausefahr
is also declared with extern, but the definition of cels is an inline definition. Becausecels
has external linkage and is referenced, an external definition has to appear in another translation unit (see 6.9); the inline definition and the external definition are distinct and either may be used for the call.118) By using, for example, an alternative to the usual function call mechanism, such as "inline substitution". Inline substitution is not textual substitution, nor does it create a new function. Therefore, for example, the expansion of a macro used within the body of the function uses the definition it had at the point the function body appears, and not where the function is called; and identifiers refer to the declarations in scope where the body occurs. Likewise, the function has a single address, regardless of the number of inline definitions that occur in addition to the external definition.
119) For example, an implementation might never perform inline substitution, or might only perform inline substitutions to calls in the scope of an
inline
declaration.120) Since an inline definition is distinct from the corresponding external definition and from any other corresponding inline definitions in other translation units, all corresponding objects with static storage duration are also distinct in each of the definitions.
Note that GCC also had inline
functions in C before they were standardized. Read the GCC manual for details if you need that notation.
It seems that you have rediscovered on your own the issue about the separate compilation model that C and C++ use. While it certainly eases memory requirements (which was important at the time of its creation), it does so by exposing only minimal information to the compiler, meaning that some optimizations (like this one) cannot be performed.
Newer languages, with their module systems can expose as much information as necessary, and we can hope to rip those benefits if modules get into the next version of C++...
In the mean time, the simplest thing to go for is called Link-Time Optimization. The idea is that you will perform as much optimization as possible on each TU (Translation Unit) to obtain an object file, but you will also enrich the traditional object file (which contain assembly) with IR (Intermediate Representation, used by compilers to optimize) for part of or all functions.
When the linker will be invoked to merge those object files together, instead of just merging the files together, it will merge the IR representations, rexeecute a number of optimization passes (constant propagation, inlining, ...) and then create assembly on its own. It means that instead of being just a linker, it is in fact a backend optimizer.
Of course, like all optimization passes this has a cost, so makes for longer compilation. Also, it means that both the compiler and the linker should be passed a special option to trigger this behavior, in the case of gcc, it would be -lto
or -O4
.
You may be looking for Link-Time Optimization (LTO), aka Whole Program Optimization.