lock-free | 易学教程

Using Boost.Lockfree queue is slower than using mutexes

阅读更多关于 Using Boost.Lockfree queue is slower than using mutexes

Until now I was using std::queue in my project. I measured the average time which a specific operation on this queue requires. The times were measured on 2 machines: My local Ubuntu VM and a remote server. Using std::queue , the average was almost the same on both machines: ~750 microseconds. Then I "upgraded" the std::queue to boost::lockfree::spsc_queue , so I could get rid of the mutexes protecting the queue. On my local VM I could see a huge performance gain, the average is now on 200 microseconds. On the remote machine however, the average went up to 800 microseconds, which is slower than

Impossible constraint with cmpxchg16b in extended assembly

阅读更多关于 Impossible constraint with cmpxchg16b in extended assembly

I am trying to write inline assembly with my C code to perform compare and swap operation. My code is: typedef struct node { int data; struct node * next; struct node * backlink; int flag; int mark; } node_lf; typedef struct searchfrom { node_lf * current; node_lf * next; } return_sf; typedef struct csArg { node_lf * node; int mark; int flag; } cs_arg; typedef struct return_tryFlag { node_lf * node; int result; } return_tf; static inline node_lf cs(node_lf * address, cs_arg *old_val, cs_arg *new_val) { node_lf value = *address; __asm__ __volatile__("lock; cmpxchg16b %0; setz %1;" :"=m"(*

Tagged Pointers for lockFree list in C

阅读更多关于 Tagged Pointers for lockFree list in C

问题 I am trying to use tagged pointers for handling the lock free operations on a list, in order to block the compare-and-swap (CAS) from going through if some other thread operated on the list during this transaction. My node struct and CAS look like this: struct node { unsigned long key; unsigned long val; struct node * next; }; static inline bool CAS(std::atomic<node*> node, struct node* oldNode, struct node* newNode) { node.compare_exchange_strong(oldNode, newNode, std::memory_order_seq_cst);

Tagged Pointers for lockFree list in C

阅读更多关于 Tagged Pointers for lockFree list in C

I am trying to use tagged pointers for handling the lock free operations on a list, in order to block the compare-and-swap (CAS) from going through if some other thread operated on the list during this transaction. My node struct and CAS look like this: struct node { unsigned long key; unsigned long val; struct node * next; }; static inline bool CAS(std::atomic<node*> node, struct node* oldNode, struct node* newNode) { node.compare_exchange_strong(oldNode, newNode, std::memory_order_seq_cst); } I found some methods for setting and checking these pointers but it is unclear to me how they work,

Is std::vector thread-safe and concurrent by default? Why or why not?

阅读更多关于 Is std::vector thread-safe and concurrent by default? Why or why not?

问题 What does it mean to make a dynamic array thread-safe and concurrent? Say, for example, std::vector . Two threads may want to insert at the same position. No synchronization needed as it will be done as per thread scheduling. One thread erasing and another going to access the same element? This is not a data structure issue I believe, it is a usage problem. So is there anything that needs to be done over std::vector to make it thread-safe and concurrent or is it thread-safe and concurrent by

What happens when different CPU cores write to the same RAM address without synchronization?

阅读更多关于 What happens when different CPU cores write to the same RAM address without synchronization?

Let's assume that 2 cores are trying to write different values to the same RAM address (1 byte), at the same moment of time (plus-minus eta), and without using any interlocked instructions or memory barriers. What happens in this case and what value will be written to the main RAM? The first one wins? The last one wins? Undetermined behavior? x86 (like every other mainstream SMP CPU architecture) has coherent data caches . It's impossible for two difference caches (e.g. L1D of 2 different cores) to hold conflicting data for the same cache line. The hardware imposes an order (by some

Does exchange or compare_and_exchange reads last value in modification order?

阅读更多关于 Does exchange or compare_and_exchange reads last value in modification order?

问题 I am reading C++ Concurrency in Action by Anthony Williams. At section "Understanding Relaxed Ordering" it has: There are a few additional things you can tell the man in the cubicle, such as “write down this number, and tell me what was at the bottom of the list ” (exchange) and “write down this number if the number on the bottom of the list is that; otherwise tell me what I should have guessed” (compare_exchange_strong), but that doesn’t affect the general principle. Does it mean that such

Genuinely test std::atomic is lock-free or not

阅读更多关于 Genuinely test std::atomic is lock-free or not

Since std::atomic::is_lock_free() may not genuinely reflect the reality [ ref ], I'm considering writing a genuine runtime test instead. However, when I got down to it, I found that it's not a trivial task I thought it to be. I'm wondering whether there is some clever idea that could do it. Other than performance, the standard doesn't guarantee any way you can tell; that's more or less the point. If you are willing to introduce some platform-specific UB, you could do something like cast a atomic<int64_t> * to a volatile int64_t* and see if you observe "tearing" when another thread reads the

Genuinely test std::atomic is lock-free or not

阅读更多关于 Genuinely test std::atomic is lock-free or not

问题 Since std::atomic::is_lock_free() may not genuinely reflect the reality [ref], I'm considering writing a genuine runtime test instead. However, when I got down to it, I found that it's not a trivial task I thought it to be. I'm wondering whether there is some clever idea that could do it. 回答1: Other than performance, the standard doesn't guarantee any way you can tell; that's more or less the point. If you are willing to introduce some platform-specific UB, you could do something like cast a

Do the ARM instructions ldrex/strex have to operate on cache aligned data?

阅读更多关于 Do the ARM instructions ldrex/strex have to operate on cache aligned data?

On Intel, the arguments to CMPXCHG must be cache line aligned (since Intel uses MESI to implement CAS). On ARM, ldrex and strex operate on exclusive reservation granuales. To be clear, does this then mean on ARM the data being operated upon does not have to be cache line aligned? It says so right in the ARM Architecture Reference Manual A.3.2.1 "Unaligned data access". LDREX and STREX require word alignment. Which makes sense, because an unaligned data access can span exclusive reservation granules. Exclusive access restrictions The following restrictions apply to exclusive accesses: • The