CMPXCHG16B correct?

后端 未结 5 713
粉色の甜心
粉色の甜心 2021-02-08 17:34

This doesn\'t exactly seem to be right although I am unsure why. Advice would be great as the documentation for CMPXCHG16B is pretty minimal (I don\'t own any intel manuals...)<

5条回答
  •  北荒
    北荒 (楼主)
    2021-02-08 17:52

    Noticed a few issues,

    (1) The main problem is the constraints, "rax" doesn't do what it looks like, rather the first character "r" lets gcc use any register.

    (2) Not sure how your storing types::uint128_t, but assuming the standard little endian for x86 platforms, then the high and low dwords are also swapped around.

    (3) Taking the address of something and casting it to something else can break aliasing rules. Depends on how your types::uint128_t is defined as to wether or not this is an issue (fine if it is a struct of two uint64_t's). GCC with -O2 will optimize assuming aliasing rules are not violated.

    (4) *src should really be marked as an output, rather than specifying memory clobber. but this is really more of a performance rather than correctness issue. similarly rbx and rcx do not need to specified as clobbered.

    Here is a a version that works,

    #include 
    
    namespace types
    {
        // alternative: union with  unsigned __int128
        struct uint128_t
        {
            uint64_t lo;
            uint64_t hi;
        }
        __attribute__ (( __aligned__( 16 ) ));
    }
    
    template< class T > inline bool cas( volatile T * src, T cmp, T with );
    
    template<> inline bool cas( volatile types::uint128_t * src, types::uint128_t cmp, types::uint128_t with )
    {
        // cmp can be by reference so the caller's value is updated on failure.
    
        // suggestion: use __sync_bool_compare_and_swap and compile with -mcx16 instead of inline asm
        bool result;
        __asm__ __volatile__
        (
            "lock cmpxchg16b %1\n\t"
            "setz %0"       // on gcc6 and later, use a flag output constraint instead
            : "=q" ( result )
            , "+m" ( *src )
            , "+d" ( cmp.hi )
            , "+a" ( cmp.lo )
            : "c" ( with.hi )
            , "b" ( with.lo )
            : "cc", "memory" // compile-time memory barrier.  Omit if you want memory_order_relaxed compile-time ordering.
        );
        return result;
    }
    
    int main()
    {
        using namespace types;
        uint128_t test = { 0xdecafbad, 0xfeedbeef };
        uint128_t cmp = test;
        uint128_t with = { 0x55555555, 0xaaaaaaaa };
        return ! cas( & test, cmp, with );
    }
    

提交回复
热议问题