Not getting expected output using cmpxchg8b for unsigned long

拥有回忆 提交于 2019-11-29 18:16:06

Let me start by saying "Using inline asm is a bad idea." And let me repeat that "Using inline asm is a bad idea." You could write an entire wiki entry about why using inline asm is a bad idea. Please consider using builtins (like gcc's __sync_bool_compare_and_swap) or libraries like <atomic> instead.

If you are writing production software, the risks from using inline asm are almost certainly greater than any benefit. If you are writing for educational purposes, then read on.

(For a further illustration of why you shouldn't use inline asm, wait for Michael or Peter to show up and point out all the things wrong with this code. It's really hard, even for people who know this stuff, to get it right.)

Here is some code showing how to use cmpxchg8b. It is simple, but should be sufficient to give a general idea.

#include <stdio.h>

// Simple struct to break up the 8 byte value into 32bit chunks.
typedef union {
  struct {
     unsigned int lower;
     unsigned int upper;
  };
  unsigned long long int f;
} moo;

unsigned char cas(moo *ptr, moo *oldval, const moo *newval)
{
   unsigned char result;

#ifndef __GCC_ASM_FLAG_OUTPUTS__

   asm ("lock cmpxchg8b %[ptr]\n\t"
        "setz %[result]"
        : [result] "=q" (result), [ptr] "+m" (*ptr),
          "+d" (oldval->upper), "+a" (oldval->lower)
        : "c" (newval->upper), "b" (newval->lower)
        : "cc", "memory");

#else

   asm ("lock cmpxchg8b %[ptr]"
        : [result] "=@ccz" (result), [ptr] "+m" (*ptr),
          "+d" (oldval->upper), "+a" (oldval->lower)
        : "c" (newval->upper), "b" (newval->lower)
        : "memory");

#endif

   return result;
}

int main()
{
   moo oldval, newval, curval;
   unsigned char ret;

   // Will not change 'curval' since 'oldval' doesn't match.
   curval.f = -1;
   oldval.f = 0;
   newval.f = 1;

   printf("If curval(%u:%u) == oldval(%u:%u) "
          "then write newval(%u:%u)\n",
          curval.upper, curval.lower,
          oldval.upper, oldval.lower,
          newval.upper, newval.lower);

   ret = cas(&curval, &oldval, &newval);

   if (ret)
      printf("Replace succeeded: curval(%u:%u)\n",
             curval.upper, curval.lower);
   else
      printf("Replace failed because curval(%u:%u) "
             "needed to be (%u:%u) (which cas has placed in oldval).\n",
             curval.upper, curval.lower,
             oldval.upper, oldval.lower);

   printf("\n");

   // Now that 'curval' equals 'oldval', newval will get written.
   curval.lower = 1234; curval.upper = 4321;
   oldval.lower = 1234; oldval.upper = 4321;
   newval.f = 1;

   printf("If curval(%u:%u) == oldval(%u:%u) "
          "then write newval(%u:%u)\n",
          curval.upper, curval.lower,
          oldval.upper, oldval.lower,
          newval.upper, newval.lower);

   ret = cas(&curval, &oldval, &newval);

   if (ret)
      printf("Replace succeeded: curval(%u:%u)\n",
             curval.upper, curval.lower);
   else
      printf("Replace failed because curval(%u:%u) "
             "needed to be (%u:%u) (which cas has placed in oldval).\n",
             curval.upper, curval.lower,
             oldval.upper, oldval.lower);

}

A few points:

  • If the cas fails (because the values don't match), the return value from the function is 0, and the value you need to use is returned in oldval. This makes trying again simple. Note that if you are running multi-threaded (which you must be or you wouldn't be using lock cmpxchg8b), a second attempt could conceivable fail as well, since the 'other' thread could have beaten you to the write again.
  • The __GCC_ASM_FLAG_OUTPUTS__ define is available on newer builds of gcc (6.x+). It allows you to skip doing the setz and use the flags directly. See the gcc docs for details.

As for how it works:

When we call cmpxchg8b, we pass it a pointer to memory. It is going to compare the (8 byte) value that is in that memory location to the 8 bytes in edx:eax. If they match, then it will write the 8 bytes in ecx:ebx to the memory location and the zero flag will be set. If they don't match, then the current value will be returned in edx:eax and the zero flag will be cleared.

So, compare that with the code:

   asm ("lock cmpxchg8b %[ptr]"

Here we are passing the pointer to the 8 bytes to cmpxchg8b.

        "setz %[result]"

Here we are storing the contents of the zero flag set by cmpxchg8b into (result).

        : [result] "=q" (result), [ptr] "+m" (*ptr),

Specify that (result) is an output (=), and that it must be a byte register (q). Also, the memory pointer is an in+out (+) since we will be both reading it and writing to it.

          "+d" (oldval->upper), "+a"(oldval->lower)

The + signs again indicate that these values are in+out. This is necessary since if the comparison fails, edx:eax will be overwritten with the current value from ptr.

        : "c" (newval->upper), "b"(newval->lower)

These values are input-only. The cmpxchg8b isn't going to change their values so we put them after the second colon.

        : "cc", "memory");

Since we are changing the flags, we need to inform the compiler via "cc". The "memory" constraint may not be necessary, depending on exactly what cas is being used for. It's possible that thread 1 is notifying thread 2 that something is ready for processing. In that case, you want to make absolutely sure gcc doesn't have any values in registers that it is planning to write to memory later. It absolutely must flush them all to memory before executing the cmpxchg8b.

The gcc docs describe in detail the workings of the extended asm statement. If parts of this explanation are still unclear, some reading might help.

BTW in case I forgot to mention, writing inline asm is a bad idea...

Sorry for not answering your question directly, but my question is: why not use C11's <stdatomic.h> or C++11's <atomic>? It's a lot less error-prone than writing your own functions and has the advantage that you're not targeting a specific hardware architecture or compiler.

In your case you should either be using atomic_compare_exchange_weak() or atomic_compare_exchange_strong().

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!