When to use a certain calling convention

问题

Are there any guidelines in x86-64 for when a function should abide by the System V guidelines and when it doesn't matter? This is in response to an answer here which mentions using other calling conventions for simplifying an internal/local function.

# gcc 32-bit regparm calling convention
is_even:          # input in RAX, bool return value in AL
    not   %eax             # 2 bytes
    and   $1, %al          # 2 bytes
    ret

# custom calling convention:
is_even:   # input in RDI
           # returns in ZF.  ZF=1 means even
    test  $1, %dil         # 4 bytes.  Would be 2 for AL, 3 for DL or CL (or BL)
    ret

Please see that answer for context.

For example, should it be used:

Only needed when called by an external higher-level C function.
Only needed when that label/function is globl.

Or what's the best guideline as to when to use registers "as I please" and then when to use them according to the System V convention?

回答1:

It depends what kind of thing you're writing in asm. If you're writing a small self-contained asm program that's purely written in asm, such as a 16-bit bootloader, definitely go ahead and make up custom calling conventions for everything (if you make any functions at all, instead of just inlining). e.g. have a look at the disp_ax_hex function in @ecm's legacy BIOS bootloader as an interesting example, and see discussion in comments about letting disp_al clobber more registers.

I'd say generally do follow the standard calling convention in most other code (part of a larger program that includes compiler-generated code); x86-64 System V is quite well designed. Only consider using a custom convention for "private" helper functions, especially ones that are only called from different parts of one other functions. Typically these have their callers all in one file, so not global.

Functions that can usefully return 2 separate values can definitely benefit from a custom calling convention, for the benefit of asm callers.
e.g. C memcmp doesn't return the position of the first difference, only - / 0 / +. This is really stupid and useless, depriving us of a good way to take advantage of the existing hand-optimized asm to find the mismatch position. In asm we can easily just return both, like a pointer to the position in RDI and a cmp result in FLAGS.

In that case, you could write a memcmp function that was 100% compatible with the x86-64 System V calling convention (so you'd need to zero-extend both bytes and do a dword sub, instead of just doing a byte cmp), with the RDI output as a bonus for asm callers.

The part of my answer you linked was kind of a random thought I decided to mention. It's not something you normally do (although neither is writing asm by hand in the first place), and you'd never want to actually put test in a function by itself except as a solution to a code-golf exercise. That was the real idea behind it: most of the "cost" of that function is just because you made it a function, and in real life you'd always inline something that simple.

Usually you don't write tiny functions in the first place. You just implement the logic in a couple instructions in the middle of a larger function, just like a compiler would inline a small helper function. Then it's not costly to follow the platform ABI (x86-64 System V in this case) for all your functions.

Optimizing the logic to return a 0 / 1 int (not just an 8-bit bool), and sticking to the standard calling convention, could be a fun exercise but often not useful unless it turns out your actual use-case wants to do something like even_count += is_even(x);. But in that case, you should do odds += x&1; and calculate the even count once at the end when you need it, as even = total-odd. Besides removing the call/return overhead, inlining also allows thinking about optimizing the logic of a tiny function as part of the actual use-case.

There is a use-case for private helper functions:

Sometimes you want to repeat a block of several instructions as a private "helper" function for a larger function, uses like e.g. mov eax, 1 / call func / do something else / mov eax 123 / call func. Then you can think of the "function" more like a loop body or something inside a larger function, and the caller more like custom iteration.

Sometimes it makes sense to repeat a block of code using a macro, but if the sequence is somewhat long that will bloat your code. (Macros expand every time you use them; unlike a 5-byte call rel32.)

Just to be clear, is_even is so simple that it would never make sense to put it in its own function. Calling a function instead of just running test $1, %reg / jz or jnz for some register would be completely insane and obfuscated, as well as larger and slower. Or and $1, %eax to get a 0/1 integer from the reg being odd, which you could use with add to count odd numbers. (total-odd at the end to count even). Most programmers would not wrap it in a macro either; understanding binary is standard for assembly language, and a simple comment on the test or jcc instruction to describe the semantic meaning (# if odd) is all that would be needed.

In theory, for a purely hand-written program, you can just use whatever calling convention is most convenient on a case-by-case basis for every function, documenting with comments. But normally the benefit is small vs. following a standard calling convention, and keeping track which function clobbers which registers and wants its args where would quickly become a maintenance nightmare for general-purpose functions that have multiple different callers that aren't highly related to each other than the function being called.

Of course, for the same reason, we write applications in high-level languages and only rarely actually write any asm by hand. The fact that you're proposing to write functions by hand in asm means it's worth considering whether "thinking like a compiler" too constraining. That's the point of my codegolf answer: if it's worth squeezing every last byte or cycle out of a function, the whole program (or at least its caller) is probably written similarly.

The only good reason for writing whole programs in asm these days is to optimize the crap out of their machine-code size, e.g. the demo scene. https://en.wikipedia.org/wiki/Demoscene. (Or if the "program" is really a bootloader that runs without / before an OS.)

At that point, don't let ABIs and calling conventions constrain your optimization. And your program will generally be small enough that it's possible to keep track of the different functions and their calling conventions, especially if they make some logical sense (or mostly match the registers where their callers happen to be keeping the right variables anyway).

来源：https://stackoverflow.com/questions/64149600/when-to-use-a-certain-calling-convention

标签

assembly

x86

x86-64

calling-convention

micro-optimization