I\'m familiar with data alignment and performance but I\'m rather new to aligning code. I started programming in x86-64 assembly recently with NASM and have been comparing
Ahhh, code alignment...
Some basics of code alignment..
Having said all that blah blah, your issue could be one of any of these. It's important to look at the disassembly of not just the object, but the executable. You want to see what the final addresses are after everything is linked. Making changes in one object, could affect the alignment/addresses of instructions in another object after linking.
In some cases, it's near impossible to align your code in such a way as to maximize performance, simply due to so many low level architectural behaviors being hard to control and predict (that doesn't necessarily mean this is always the case). In some cases, your best bet is to have some default alignment strategy (say align all entries on 16B boundaries, and outer loops the same) so as you minimize the amount your performance varies from change-to-change. As a general strategy, aligning function entries is good. Aligning loops that are relatively small is good, as long as you're not adding nops in your execution path.
Beyond that, I'd need more info/data to pinpoint your exact problem, but thought some of this may help.. Good luck :)
The confusing nature of the effect (the assembled code doesn't change!) you are seeing is due to section alignment. When using the ALIGN macro in NASM, it actually has two separate effects:
Add 0 or more nop
instructions so that the next instruction is aligned to the specified power-of-two boundary.
Issue an implicit SECTALIGN
macro call which will set the section alignment directive to alignment amount1.
The first point is the commonly understood behavior for align. It aligns the loop relatively within the section in the output file.
The second part is also needed however: imagine your loop was aligned to a 32 byte boundary in the assembled section, but then the runtime loader put your section, in memory, at an address aligned only to 8 bytes: this would make the in-file alignment quite pointless. To fix this, most executable formats allow each section to specify an alignment requirement, and the runtime loader/linker will be sure to load the section at a memory address which respects the requirement.
That's what the hidden SECTALIGN macro does - it ensures that your ALIGN
macro works.
For your file, there is no difference in the assembled code between ALIGN 16
and ALIGN 32
because the next 16-byte boundary happens to also be the next 32-byte boundary (of course, every other 16-byte boundary is a 32-byte one, so that happens about half the time). The implicit SECTALIGN
call is still different though, and that's the one byte difference you see in your hexdump. The 0x20 is decimal 32, and the 0x10 is decimal 16.
You can verify this with objdump -h <binary>
. Here's an example on a binary I aligned to 32 bytes:
objdump -h loop-test.o
loop-test.o: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000d18a 0000000000000000 0000000000000000 00000180 2**5
CONTENTS, ALLOC, LOAD, READONLY, CODE
The 2**5
in the Algn
column is the 32-byte alignment. With 16-byte alignment this changes to 2**4
.
Now it should be clear what happens - aligning the first function in your example changes the section alignment, but not the assembly. When you linked your program together, the linker will merge the various .text
sections and pick the highest alignment.
At runtime, then this causes the code to be aligned to a 32-byte boundary - but this doesn't affect the first function, because it isn't alignment sensitive. Since the linker has merged your object files into one section, the larger alignment of 32 changes the alignment of every function (and instruction) in the section, including your other method, and so it changes the performance of your other function, which is alignment-sensitive.
1To be precise, SECTALIGN
only changes the section alignment if the current section alignment is less than the specified amount - so the final section alignment will be the same as the largest SECTALIGN
directive in the section.