I am trying to write a simple compare and swap inline assembly code. Here is my code
#include
#include
#include
Sorry for not answering your question directly, but my question is: why not use C11's <stdatomic.h> or C++11's <atomic>? It's a lot less error-prone than writing your own functions and has the advantage that you're not targeting a specific hardware architecture or compiler.
In your case you should either be using atomic_compare_exchange_weak()
or atomic_compare_exchange_strong()
.
Let me start by saying "Using inline asm is a bad idea." And let me repeat that "Using inline asm is a bad idea." You could write an entire wiki entry about why using inline asm is a bad idea. Please consider using builtins (like gcc's __sync_bool_compare_and_swap) or libraries like <atomic> instead.
If you are writing production software, the risks from using inline asm are almost certainly greater than any benefit. If you are writing for educational purposes, then read on.
(For a further illustration of why you shouldn't use inline asm, wait for Michael or Peter to show up and point out all the things wrong with this code. It's really hard, even for people who know this stuff, to get it right.)
Here is some code showing how to use cmpxchg8b
. It is simple, but should be sufficient to give a general idea.
#include <stdio.h>
// Simple struct to break up the 8 byte value into 32bit chunks.
typedef union {
struct {
unsigned int lower;
unsigned int upper;
};
unsigned long long int f;
} moo;
unsigned char cas(moo *ptr, moo *oldval, const moo *newval)
{
unsigned char result;
#ifndef __GCC_ASM_FLAG_OUTPUTS__
asm ("lock cmpxchg8b %[ptr]\n\t"
"setz %[result]"
: [result] "=q" (result), [ptr] "+m" (*ptr),
"+d" (oldval->upper), "+a" (oldval->lower)
: "c" (newval->upper), "b" (newval->lower)
: "cc", "memory");
#else
asm ("lock cmpxchg8b %[ptr]"
: [result] "=@ccz" (result), [ptr] "+m" (*ptr),
"+d" (oldval->upper), "+a" (oldval->lower)
: "c" (newval->upper), "b" (newval->lower)
: "memory");
#endif
return result;
}
int main()
{
moo oldval, newval, curval;
unsigned char ret;
// Will not change 'curval' since 'oldval' doesn't match.
curval.f = -1;
oldval.f = 0;
newval.f = 1;
printf("If curval(%u:%u) == oldval(%u:%u) "
"then write newval(%u:%u)\n",
curval.upper, curval.lower,
oldval.upper, oldval.lower,
newval.upper, newval.lower);
ret = cas(&curval, &oldval, &newval);
if (ret)
printf("Replace succeeded: curval(%u:%u)\n",
curval.upper, curval.lower);
else
printf("Replace failed because curval(%u:%u) "
"needed to be (%u:%u) (which cas has placed in oldval).\n",
curval.upper, curval.lower,
oldval.upper, oldval.lower);
printf("\n");
// Now that 'curval' equals 'oldval', newval will get written.
curval.lower = 1234; curval.upper = 4321;
oldval.lower = 1234; oldval.upper = 4321;
newval.f = 1;
printf("If curval(%u:%u) == oldval(%u:%u) "
"then write newval(%u:%u)\n",
curval.upper, curval.lower,
oldval.upper, oldval.lower,
newval.upper, newval.lower);
ret = cas(&curval, &oldval, &newval);
if (ret)
printf("Replace succeeded: curval(%u:%u)\n",
curval.upper, curval.lower);
else
printf("Replace failed because curval(%u:%u) "
"needed to be (%u:%u) (which cas has placed in oldval).\n",
curval.upper, curval.lower,
oldval.upper, oldval.lower);
}
A few points:
lock cmpxchg8b
), a second attempt could conceivable fail as well, since the 'other' thread could have beaten you to the write again.__GCC_ASM_FLAG_OUTPUTS__
define is available on newer builds of gcc (6.x+). It allows you to skip doing the setz
and use the flags directly. See the gcc docs for details.As for how it works:
When we call cmpxchg8b
, we pass it a pointer to memory. It is going to compare the (8 byte) value that is in that memory location to the 8 bytes in edx:eax. If they match, then it will write the 8 bytes in ecx:ebx to the memory location and the zero
flag will be set. If they don't match, then the current value will be returned in edx:eax and the zero
flag will be cleared.
So, compare that with the code:
asm ("lock cmpxchg8b %[ptr]"
Here we are passing the pointer to the 8 bytes to cmpxchg8b
.
"setz %[result]"
Here we are storing the contents of the zero
flag set by cmpxchg8b
into (result).
: [result] "=q" (result), [ptr] "+m" (*ptr),
Specify that (result) is an output (=), and that it must be a byte register (q). Also, the memory pointer is an in+out (+) since we will be both reading it and writing to it.
"+d" (oldval->upper), "+a"(oldval->lower)
The + signs again indicate that these values are in+out. This is necessary since if the comparison fails, edx:eax will be overwritten with the current value from ptr.
: "c" (newval->upper), "b"(newval->lower)
These values are input-only. The cmpxchg8b
isn't going to change their values so we put them after the second colon.
: "cc", "memory");
Since we are changing the flags, we need to inform the compiler via "cc". The "memory" constraint may not be necessary, depending on exactly what cas is being used for. It's possible that thread 1 is notifying thread 2 that something is ready for processing. In that case, you want to make absolutely sure gcc doesn't have any values in registers that it is planning to write to memory later. It absolutely must flush them all to memory before executing the cmpxchg8b
.
The gcc docs describe in detail the workings of the extended asm statement. If parts of this explanation are still unclear, some reading might help.
BTW in case I forgot to mention, writing inline asm is a bad idea...