Why does Windows64 use a different calling convention from all other OSes on x86-64?

前端 未结 4 1814
予麋鹿
予麋鹿 2020-11-22 04:12

AMD has an ABI specification that describes the calling convention to use on x86-64. All OSes follow it, except for Windows which has it\'s own x86-64 calling convention. Wh

4条回答
  •  旧时难觅i
    2020-11-22 04:21

    Choosing four argument registers on x64 - common to UN*X / Win64

    One of the things to keep in mind about x86 is that the register name to "reg number" encoding is not obvious; in terms of instruction encoding (the MOD R/M byte, see http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm), register numbers 0...7 are - in that order - ?AX, ?CX, ?DX, ?BX, ?SP, ?BP, ?SI, ?DI.

    Hence choosing A/C/D (regs 0..2) for return value and the first two arguments (which is the "classical" 32bit __fastcall convention) is a logical choice. As far as going to 64bit is concerned, the "higher" regs are ordered, and both Microsoft and UN*X/Linux went for R8 / R9 as the first ones.

    Keeping that in mind, Microsoft's choice of RAX (return value) and RCX, RDX, R8, R9 (arg[0..3]) are an understandable selection if you choose four registers for arguments.

    I don't know why the AMD64 UN*X ABI chose RDX before RCX.

    Choosing six argument registers on x64 - UN*X specific

    UN*X, on RISC architectures, has traditionally done argument passing in registers - specifically, for the first six arguments (that's so on PPC, SPARC, MIPS at least). Which might be one of the major reasons why the AMD64 (UN*X) ABI designers chose to use six registers on that architecture as well.

    So if you want six registers to pass arguments in, and it's logical to choose RCX, RDX, R8 and R9 for four of them, which other two should you pick ?

    The "higher" regs require an additional instruction prefix byte to select them and therefore have a bigger instruction size footprint, so you wouldn't want to choose any of those if you have options. Of the classical registers, due to the implicit meaning of RBP and RSP these aren't available, and RBX traditionally has a special use on UN*X (global offset table) which seemingly the AMD64 ABI designers didn't want to needlessly become incompatible with.
    Ergo, the only choice were RSI / RDI.

    So if you have to take RSI / RDI as argument registers, which arguments should they be ?

    Making them arg[0] and arg[1] has some advantages. See cHao's comment.
    ?SI and ?DI are string instruction source / destination operands, and as cHao mentioned, their use as argument registers means that with the AMD64 UN*X calling conventions, the simplest possible strcpy() function, for example, only consists of the two CPU instructions repz movsb; ret because the source/target addresses have been put into the correct registers by the caller. There is, particularly in low-level and compiler-generated "glue" code (think, for example, some C++ heap allocators zero-filling objects on construction, or the kernel zero-filling heap pages on sbrk(), or copy-on-write pagefaults) an enormous amount of block copy/fill, hence it'll be useful for code so frequently used to save the two or three CPU instructions that'd otherwise load such source/target address arguments into the "correct" registers.

    So in a way, UN*X and Win64 are only different in that UN*X "prepends" two additional arguments, in purposefully chosen RSI/RDI registers, to the natural choice of four arguments in RCX, RDX, R8 and R9.

    Beyond that ...

    There are more differences between the UN*X and Windows x64 ABIs than just the mapping of arguments to specific registers. For the overview on Win64, check:

    http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx

    Win64 and AMD64 UN*X also strikingly differ in the way stackspace is used; on Win64, for example, the caller must allocate stackspace for function arguments even though args 0...3 are passed in registers. On UN*X on the other hand, a leaf function (i.e. one that doesn't call other functions) is not even required to allocate stackspace at all if it needs no more than 128 Bytes of it (yes, you own and can use a certain amount of stack without allocating it ... well, unless you're kernel code, a source of nifty bugs). All these are particular optimization choices, most of the rationale for those is explained in the full ABI references that the original poster's wikipedia reference points to.

提交回复
热议问题