I have a legacy firmware application that requires new functionality. The size of the application was already near the limited flash capacity of the device and the few new funct
Generally: make use of your linker map or tools to figure out what your largest/most numerous symbols are, and then possibly take a look at them using a disassembler. You'd be surprised at what you find this way.
With a bit of perl or the like, you can make short work of a .xMAP file or the results of "objdump" or "nm", and re-sort it various ways for pertinent info.
Specific to small instruction sets: Watch for literal pool usage. While changing from e.g. the ARM (32 bits per instruction) instruction set to the THUMB (16 bits per instruction) instruction set can be useful on some ARM processors, it reduces the size of the "immediate" field.
Suddenly something that would be a direct load from a global or static becomes very indirect; it must first load the address of the global/static into a register, then load from that, rather than just encoding the address directly in the instruction. So you get a few extra instructions and an extra entry in the literal pool for something that normally would have been one instruction.
A strategy to fight this is to group globals and statics together into structures; this way you only store one literal (the address of your global structure) and compute offsets from that, rather than storing many different literals when you're accessing multiple statics/globals.
We converted our "singleton" classes from managing their own instance pointers to just being members in a large "struct GlobalTable", and it make a noticeable difference in code size (a few percent) as well as performance in some cases.
Otherwise: keep an eye out for static structures and arrays of non-trivially-constructed data. Each one of these typically generates huge amounts of .sinit code ("invisible functions", if you will) that are run before main() to populate these arrays properly. If you can use only trivial data types in your statics, you'll be far better off.
This is again something that can be easily identified by using a tool over the results of "nm" or "objdump" or the like. If you have a ton of .sinit stuff, you'll want to investigate!
Oh, and -- if your compiler/linker supports it, don't be afraid to selectively enable optimization or smaller instruction sets for just certain files or functions!