I know that there is an option \"-Os\" to \"Optimize for size\", but it has little affect, or even increase the size on some occasion :(
strip (or \"-s\" option) remov
Apart from the obvious (-Os -s
), aligning functions to the smallest possible value that will not crash (I don't know ARM alignment requirements) might squeeze out a few bytes per function.
-Os
should already disable aligning functions, but this might still default to a value like 4 or 8. If aligning e.g. to 1 is possible with ARM, that might save some bytes.
-ffast-math
(or the less abrasive -fno-math-errno
) will not set errno and avoid some checks, which reduces code size. If, like most people, you don't read errno anyway, that's an option.
Properly using __restrict
(or restrict
) and const
removes redundant loads, making code both faster and smaller (and more correct). Properly marking pure functions as such eleminates function calls.
Enabling LTO may help, and if that is not available, compiling all source files into a binary in one go (gcc foo.c bar.c baz.c -o program
instead of compiling foo.c
, bar.c
, and baz.c
to object files first and then linking) will have a similar effect. It makes everything visible to the optimizer at one time, possibly allowing it to work better.
-fdelete-null-pointer-checks
may be an option (note that this is normally enabled with any "O", but not on embedded targets).
Putting static globals (you hopefully don't have that many, but still) into a struct can eleminate a lot of overhead initializing them. I learned that when writing my first OpenGL loader. Having all the function pointers in a struct and initializing the struct with = {}
generates one call to memset
, whereas initializing the pointers the "normal way" generates a hundred kilobytes of code just to set each one to zero individually.
Avoid non-trivial-constructor static local variables like the devil (POD types are no problem). Gcc will initialize non-trivial-constructor static locals threadsafe unless you compile with -fno-threadsafe-statics
, which links in a lot of extra code (even if you don't use threads at all).
Using something like libowfat instead of the normal crt can greatly reduce your binary size.