Impossible constraint with cmpxchg16b in extended assembly

前端 未结 1 367
粉色の甜心
粉色の甜心 2021-01-29 11:05

I am trying to write inline assembly with my C code to perform compare and swap operation. My code is:

typedef struct node {
    int data;
    struct no         


        
相关标签:
1条回答
  • 2021-01-29 12:07

    So, let's take a crack at this.

    A few points before we get started:

    1. Using inline asm is a bad idea. It is hard to write, it is hard to write correctly, it is hard to maintain, it isn't portable to other compilers or platforms, etc. Unless this is an assignment requirement, don't do it.
    2. When performing cmpxchg operations, the fields to be compared/exchanged must be contiguous. So if you want to operate on next, flag and mark in a single operation, they must be next to each other in the structure.
    3. When performing cmpxchg operations, the fields must be aligned on an appropriately sized boundary. For example if you are planning to operate on 16bytes, the data must be aligned on a 16byte boundary. gcc provides a variety of ways to do this from the aligned attribute, to _mm_malloc.
    4. When using __sync_bool_compare_and_swap (a better choice than inline asm), you must cast the data types to an appropriately-sized integer.
    5. I'm assuming your platform is x64.

    2 & 3 required some changes to the field order of your structures. Note that I did not try to change searchfrom or return_tryFlag, since I'm not sure what they are used for.

    So, with those things in mind, here's what I came up with:

    #include <stdio.h>
    #include <memory.h>
    
    typedef struct node {
        struct node * next;
        int mark;
        int flag;
    
        struct node * backlink;
        int data;
    } node_lf;
    
    typedef struct csArg {
        node_lf * node;
        int mark;
        int flag;
    } cs_arg;
    
    bool cs3(node_lf * address, cs_arg *old_val, cs_arg *new_val) { 
    
        return __sync_bool_compare_and_swap((unsigned __int128 *)address,
                                            *(unsigned __int128 *)old_val,
                                            *(unsigned __int128 *)new_val);
    }
    
    void ShowIt(void *v)
    {
       unsigned long long *ull = (unsigned long long *)v;
       printf("%p:%p", *ull, *(ull + 1));
    }
    
    int main()
    {
       cs_arg oldval, newval;
       node n;
    
       memset(&oldval, 0, sizeof(oldval));
       memset(&newval, 0, sizeof(newval));
       memset(&n, 0, sizeof(node));
    
       n.mark = 3;
       newval.mark = 4;
    
       bool b;
    
       do {
          printf("If "); ShowIt(&n); printf(" is "); ShowIt(&oldval); printf(" change to "); ShowIt(&newval);
          b = cs3(&n, &oldval, &newval);
          printf(". Result %d\n", b);
    
          if (b)
             break;
          memcpy(&oldval, &n, sizeof(cs_arg));
       } while (1);  
    }
    

    When you exit the loop, oldval will be what was there before (has to be or the cas would have failed and we would have looped again) and newval will be what actually got written. Note that if this truly were multi-threaded, there is no guarantee that newval would be the same as the current contents of n, since another thread could already have come along and changed it again.

    For output we get:

    If 0000000000000000:0000000000000003 is 0000000000000000:0000000000000000 change to 0000000000000000:0000000000000000. Result 0
    If 0000000000000000:0000000000000003 is 0000000000000000:0000000000000003 change to 0000000000000000:0000000000000000. Result 1
    

    Note that the cas (correctly!) fails on the first attempt, since the 'old' value doesn't match the 'current' value.

    While using assembler may be able to save you an instruction or two, the win in terms of readability, maintainability, portability, etc is almost certainly worth the cost.

    If for some reason you must use inline asm, you will still need to re-order your structs, and the point about alignment still stands. You can also look at https://stackoverflow.com/a/37825052/2189500. It only uses 8 bytes, but the concepts are the same.

    0 讨论(0)
提交回复
热议问题