In the famous paper \"Smashing the Stack for Fun and Profit\", its author takes a C function
void function(int a, int b, int c) {
char buffer1[5];
char buffe
Memory alignment of which stack alignment is just one aspect depends on the architecture. It is partly defined in the Applicaion Binary Interface of the language and a Procedure Call Standard (sometimes it is both in a single spec) for the architecture (CPU, it might even vary depending on platform) and also depends on the compiler/toolchain where the former documents leave room for variations.
The former two documents (names may vary) are mostly for the external interface between functions; they might leave internal structure to the toolchain. Howwever, that has to match the architecture. Normally the hardware requires a minimal alignment, but allows for a larger alignment for performance reasons (e.g.: byte-alignment minimum, but this would require multiple bus-cycles to read a 32 bit word, so the compiler uses a 32 bit alignment).
Normally, the compiler (following the PCS) uses an alignment optimal for the architecture and under control of optimization settings (optimize for speed or size). It takes into account not only the size of the object (aligned to its natural boundary), but also sizes of internal busses (e.g. a 32 bit x86 has internal 64 or 128 bit busses, ARM CPUs have internal 32 to 128 (possibly even wider) bit busses), caches, etc. For local variables, it may also take into account access-patterns, so two adjacent variables may be loaded in parallel into a register pair instead of two separate loads or even reorder such variables.
The stackpointer might require a higher alignment for instance, so the CPU can push in an interrupt frame two registers at once, push vector registers which require higher alignment, etc. You can write quite a thick book about this subject (and I bet, someone already has).
So, in general, there is no single one-alignment-fits all rule. However, for struct and array packing, the C standard does define some rules for packing/alignment, mostly to guarantee consistence of e.g. sizeof(type) and the address in an array (required for correct malloc()
).
Even char arrays might be aligned for optimal cache layout. Note it is not only the CPU which might have caches, but also PCIe bridges, not to mention PCIe transfers themselves down to DRAM pages.
I have not tried that specific version of compiler or the distribution version you report. My guess would be the 16 is from byte alignment requirements on stack (i.e. all stack adjustments would be x byte aligned and x may be 16 for your invocation).
Note that variable alignment you seem to have started with, is slightly different from the above and is controlled by align markings on the variable in gcc. Try using those and you should see a difference.
What has changed is SSE, which requires 16 byte alignment, this is covered in this older gcc document for -mpreferred-stack-boundary=num which says (emphasis mine):
On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.
This is also backed up by the paper Smashing The Modern Stack For Fun And Profit which covers this an other modern changes that break Smashing the Stack for Fun and Profit.