memory-fences

clarifications on full memory barriers involved by pthread mutexes

£可爱£侵袭症+ 提交于 2019-12-04 03:43:01
问题 I have heard that when dealing with mutexes, the necessary memory barriers are handled by the pthread API itself. I would like to have more details on this matter. Are these claimings true, at least on the most common architectures around? Does the compiler recognize this implicit barrier, and avoids reordering of operations/read from local registers when generating the code? When is the memory barrier applied: after successfully acquiring a mutex AND after releasing it? 回答1: The POSIX

C# volatile variable: Memory fences VS. caching

ぃ、小莉子 提交于 2019-12-03 13:34:32
问题 So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences . However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory. So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the

pthreads v. SSE weak memory ordering

♀尐吖头ヾ 提交于 2019-12-03 12:28:40
Do the Linux glibc pthread functions on x86_64 act as fences for weakly-ordered memory accesses? (pthread_mutex_lock/unlock are the exact functions I'm interested in). SSE2 provides some instructions with weak memory ordering (non-temporal stores such as movntps in particular). If you are using these instructions and want to guarantee that another thread/core sees an ordering, then I understand you need an explicit fence for this, e.g., a sfence instruction. Normally you do expect the pthread API to act as a fence appropriately. However, I suspect normal C code on x86 will not generate weakly

Cost of using final fields

谁说胖子不能爱 提交于 2019-12-03 09:56:22
We know that making fields final is usually a good idea as we gain thread-safety and immutability which makes the code easier to reason about. I'm curious if there's an associated performance cost. The Java Memory Model guarantees this final Field Semantics : A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields. This means that for a class like this class X { X(int a) { this.a = a; } final int a; static X instance; } whenever Thread 1 creates an instance like

Is atomic decrementing more expensive than incrementing?

喜欢而已 提交于 2019-12-03 05:52:09
In his Blog Herb Sutter writes [...] because incrementing the smart pointer reference count can usually be optimized to be the same as an ordinary increment in an optimized shared_ptr implementation — just an ordinary increment instruction, and no fences, in the generated code. However, the decrement must be an atomic decrement or equivalent, which generates special processor memory instructions that are more expensive in themselves, and that on top of that induce memory fence restrictions on optimizing the surrounding code. The text is about the implementation of shared_ptr and I am not sure

C# volatile variable: Memory fences VS. caching

£可爱£侵袭症+ 提交于 2019-12-03 04:33:32
So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences . However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory. So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the ordering of the operations. It doesn't say anything about the time these changes are visible to other

The cost of atomic counters and spinlocks on x86(_64)

↘锁芯ラ 提交于 2019-12-03 04:28:39
问题 Preface I recently came across some synchronization problems, which led me to spinlocks and atomic counters. Then I was searching a bit more, how these work and found std::memory_order and memory barriers ( mfence , lfence and sfence ). So now, it seems that I should use acquire/release for the spinlocks and relaxed for the counters. Some reference x86 MFENCE - Memory Fence x86 LOCK - Assert LOCK# Signal Question What is the machine code (edit: see below) for those three operations (lock =

Fences in C++0x, guarantees just on atomics or memory in general

偶尔善良 提交于 2019-12-02 20:56:57
The C++0x draft has a notion of fences which seems very distinct from a CPU/chip level notion of fences, or say what the linux kernel guys expect of fences . The question is whether the draft really implies an extremely restricted model, or the wording is just poor and it actually implies true fences. For example, under 29.8 Fences it states things like: A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by

Out of Order Execution and Memory Fences

放肆的年华 提交于 2019-12-02 20:50:24
I know that modern CPUs can execute out of order, However they always retire the results in-order, as described by wikipedia. "Out of Oder processors fill these "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal. " Now memory fences are said to be required when using multicore platforms, because owing to Out of Order execution, wrong value of x can be printed here. Processor #1: while f == 0 ; print x; // x might not be 42 here Processor #2: x = 42; // Memory fence required here f = 1 Now

Is there an implicit memory barrier with synchronized-with relationship on thread::join?

♀尐吖头ヾ 提交于 2019-12-02 00:53:11
I have a code at work that starts multiple threads that doing some operations and if any of them fail they set the shared variable to false . Then main thread joins all the worker threads. Simulation of this looks roughly like this (I commented out the possible fix which I don't know if it's needed): #include <thread> #include <atomic> #include <vector> #include <iostream> #include <cassert> using namespace std; //atomic_bool success = true; bool success = true; int main() { vector<thread> v; for (int i = 0; i < 10; ++i) { v.emplace_back([=] { if (i == 5 || i == 6) { //success.store(false,