I am interested in encapsulating a transactional xbegin and xend inside XBEGIN( ) and XEND( ) functions, in a static assembler lib. However I am unclear how (or if) the sta
related: See also David Kanter's TSX writeup for some theory on how it works under the hood and how software can benefit from it, and this blog post for some experimental performance numbers on HSW (before the TSX bug was discovered and microcode updates disabled TSX on that hardware.)
The Intel insn ref manual entry for xbegin is pretty clear. (See the x86 tag wiki for links to Intel's official PDF, and other stuff.)
On an RTM abort, the logical processor discards all architectural register and memory updates performed during the RTM execution and restores architectural state to that corresponding to the outermost XBEGIN instruction. The fallback address following an abort is computed from the outermost XBEGIN instruction.
So the instruction works like a conditional branch, where the branch condition is "did an abort happen before XEND
?" e.g.:
; NASM syntax, I assume MASM is similar
ALIGN 16
retry:
; eax holds abort info, all other architectural state + memory is unchanged
inc [retry_count] ; or whatever other debug instrumentation you want to add
global xbegin_wrapper_with_retry
xbegin_wrapper_with_retry:
xbegin retry
ret
If an abort happens, it's as if all the code that ran after xbegin
didn't run at all, just a jump to the fallback address with eax
modified.
You might want to do something other than just infinite retries on an abort, of course. This isn't meant to be a real example. (This article does have a real example of the kind of logic you might want to use, using intrinsics. It looks like they just test eax
instead of using the xbegin
as the jump in an if
, unless the compiler optimizes that check. IDK if it's the most efficient way.)
What do you mean "interrupts effects"? In current implementations, anything that changes privilege level (like a syscall or interrupt) causes a transaction abort. So ring-level changes never need to be rolled back. The CPU will just abort the transaction when it encounters anything it can't roll back. This means the possible bugs include putting something inside the transaction that always causes an abort, but not that you do something that can't be rolled back.
You might want to try to get the compiler to emit the three-byte XEND
instruction without a function call, so pushing the return address onto the stack isn't part of the transaction. e.g.
// no idea if this is safe, or if it might get reordered by the optimizer
#define xend_MSVC __asm _emit 0x0F __asm _emit 0x01 __asm _emit 0xD5
I think this does still work in 64bit mode, since the doc mentions rax
, and it looks like IACA's header file uses __asm _emit
.
It'll be safer to put XEND
in its own wrapper function, too, I guess. You just need a stop-gap until you can upgrade to a compiler with intrinsics, so it doesn't have to be perfect as long as the extra reads/writes from the ret
and call
don't cause too many aborts.