I\'m programming a JIT compiler and I\'ve been surprised to discover that so many of the x86-64 registers are nonvolatile (callee-preserved) in the Win64 calling convention.
If registers are caller-saves, then the caller always has to save or reload those registers around a function call. But if registers are callee-saves, then the callee only has to save the registers that it uses, and only when it knows they're going to be used (i.e. maybe not at all in an early-exit scenario). The disadvantage of this convention is that the callee doesn't have knowledge of the caller, so it might be saving registers that are dead anyway, but I guess that's seen as a smaller concern.
The Windows x86-64 calling convention with only 6 call-clobbered xmm registers is not a very good design, you're right. Most SIMD (and many scalar FP) loops don't contain any function calls, so they gain nothing from having their data in call-preserved registers. The save/restore is pure downside because it's rare than any of their callers are making use of this non-volatile state.
In x86-64 System V, all the vector registers are call-clobbered, which is maybe too far the other way. Having 1 or 2 call-preserved would be nice in many cases, especially for code that makes some math library function calls. (Use gcc -fno-math-errno to let simple ones inline better; sometimes the only reason they don't is that they need to set errno
on NaN.)
Related: how the x86-64 SysV calling convention was chosen: looking at code size and instruction count for gcc compiling SPECint/SPECfp.
For integer regs, having some of each is definitely good, and all "normal" calling conventions (for all architectures, not just x86) do in fact have a mix. This reduces the total amount of work done spilling/restoring in callers and callees combined.
Forcing the caller to spill/reload everything around every function call is not good for code-size or performance. Saving / restoring some call-preserved regs at the start/end of the function lets non-leaf functions keep some things live in registers across call
s.
Consider some code that calculates a couple things and then does cout << "result: " << a << "foo" << b*c << '\n';
That's 4 function calls to std::ostream operator<<
, and they generally don't inline. Keeping the address of cout
and the locals you just computed in non-volatile registers means you only need some cheap mov reg,reg
instructions to set up the args for the next call. (Or push
in a stack-args calling convention).
But having some call-clobbered registers that can be used without saving is also very important. Functions that don't need all the architectural registers can just use the call-clobbered registers as temporaries. This avoids introducing a spill/reload into the critical path for the caller's dependency chains (for very small callees), as well as saving instructions.
Sometimes a complex function will save/restore some call-preserved registers just to get more total registers (like you're seeing with XMM for number crunching). This is generally worth it; saving/restoring the caller's non-volatile registers is usually better than spilling/reloading your own local variables to the stack, especially not if you would have to do that inside any loop.
Another reason for call-clobbered registers is that usually some of your values are "dead" after a function call: you only needed them as args to the function. Computing them in call-clobbered registers means you don't have to save/restore anything to free up those registers, but also that your callee can also freely use them. This is even better in calling conventions that pass args in registers: you can compute your inputs directly in the arg-passing registers. (And copy any to call-preserved regs or spill them to stack memory if you also need them after the function.)
(I like the terms call-preserved vs. call-clobbered, rather than caller-saved vs. callee-saved. The latter terms imply that someone must save the registers, instead of just letting dead values die. volatile / non-volatile is not bad, but those terms also have other technical meanings as C keywords, or in terms of flash vs. DRAM.)
The advantage of having nonvolatile
registers is: performance.
The less data is moved, the more efficient a CPU is.
The more volatile
registers, the more energy does the CPU need.