memory-fences

Memory ordering issues

醉酒当歌 提交于 2019-12-10 02:47:10
问题 I'm experimenting with C++0x support and there is a problem, that I guess shouldn't be there. Either I don't understand the subject or gcc has a bug. I have the following code, initially x and y are equal. Thread 1 always increments x first and then increments y . Both are atomic integer values, so there is no problem with the increment at all. Thread 2 is checking whether the x is less than y and displays an error message if so. This code fails sometimes, but why? The issue here is probably

If we marked memory as WC(Write Combined), then do we have any consistency automatically?

混江龙づ霸主 提交于 2019-12-07 20:11:13
问题 As we know on x86 architecture the acquire-release consistency provided automatically - i.e. all operations automatically ordered without any fences, exclude first store and next load operations. (As said Herb Sutter on page 34: https://onedrive.live.com/view.aspx?resid=4E86B0CF20EF15AD!24884&app=WordPdf&authkey=!AMtj_EflYn2507c ) If we put MFENCE(LFENCE+SFENCE) between them, then store can't be reordered, and load can't be reordered - i.e. we provided sequential consistency . But if we

Possible to use C11 fences to reason about writes from other threads?

天涯浪子 提交于 2019-12-06 14:22:58
Adve and Gharachorloo's report , in Figure 4b, provides the following example of a program that exhibits unexpected behavior in the absence of sequential consistency: My question is whether it is possible, using only C11 fences and memory_order_relaxed loads and stores, to ensure that register1, if written, will be written with the value 1. The reason this might be hard to guarantee in the abstract is that P1, P2, and P3 could be at different points in a pathological NUMA network with the property that P2 sees P1's write before P3 does, yet somehow P3 sees P2's write very quickly. The reason

If we marked memory as WC(Write Combined), then do we have any consistency automatically?

三世轮回 提交于 2019-12-06 11:29:32
As we know on x86 architecture the acquire-release consistency provided automatically - i.e. all operations automatically ordered without any fences, exclude first store and next load operations. (As said Herb Sutter on page 34: https://onedrive.live.com/view.aspx?resid=4E86B0CF20EF15AD!24884&app=WordPdf&authkey=!AMtj_EflYn2507c ) If we put MFENCE(LFENCE+SFENCE) between them, then store can't be reordered, and load can't be reordered - i.e. we provided sequential consistency . But if we marked memory as WC(Write Combined) , then do we have any consistency automatically without any fences, may

Memory ordering behavior of std::atomic::load

旧巷老猫 提交于 2019-12-06 03:44:21
问题 Am I wrong to assume that the atomic::load should also act as a memory barrier ensuring that all previous non-atomic writes will become visible by other threads? To illustrate: volatile bool arm1 = false; std::atomic_bool arm2 = false; bool triggered = false; Thread1: arm1 = true; //std::std::atomic_thread_fence(std::memory_order_seq_cst); // this would do the trick if (arm2.load()) triggered = true; Thread2: arm2.store(true); if (arm1) triggered = true; I expected that after executing both

Memory ordering issues

Deadly 提交于 2019-12-05 02:37:11
I'm experimenting with C++0x support and there is a problem, that I guess shouldn't be there. Either I don't understand the subject or gcc has a bug. I have the following code, initially x and y are equal. Thread 1 always increments x first and then increments y . Both are atomic integer values, so there is no problem with the increment at all. Thread 2 is checking whether the x is less than y and displays an error message if so. This code fails sometimes, but why? The issue here is probably memory reordering, but all atomic operations are sequentially consistent by default and I didn't

In OpenCL, what does mem_fence() do, as opposed to barrier()?

邮差的信 提交于 2019-12-05 00:50:14
Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group. The OpenCL spec says (section 6.11.10), for mem_fence() : Orders loads and stores of a work-item executing a kernel. (so it applies to a single work item). But, at the same time, in section 3.3.1, it says that: Within a work-item memory has load / store consistency. so within a work item the memory is consistent. So what kind of thing is mem_fence() useful for? It doesn't work across items, yet isn't needed within an item... Note that I haven't used atomic operations (section 9.5 etc). Is

pthreads v. SSE weak memory ordering

☆樱花仙子☆ 提交于 2019-12-04 18:50:59
问题 Do the Linux glibc pthread functions on x86_64 act as fences for weakly-ordered memory accesses? (pthread_mutex_lock/unlock are the exact functions I'm interested in). SSE2 provides some instructions with weak memory ordering (non-temporal stores such as movntps in particular). If you are using these instructions and want to guarantee that another thread/core sees an ordering, then I understand you need an explicit fence for this, e.g., a sfence instruction. Normally you do expect the pthread

Cost of using final fields

 ̄綄美尐妖づ 提交于 2019-12-04 15:54:29
问题 We know that making fields final is usually a good idea as we gain thread-safety and immutability which makes the code easier to reason about. I'm curious if there's an associated performance cost. The Java Memory Model guarantees this final Field Semantics: A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields. This means that for a class like this class X { X

Memory ordering behavior of std::atomic::load

浪尽此生 提交于 2019-12-04 07:02:31
Am I wrong to assume that the atomic::load should also act as a memory barrier ensuring that all previous non-atomic writes will become visible by other threads? To illustrate: volatile bool arm1 = false; std::atomic_bool arm2 = false; bool triggered = false; Thread1: arm1 = true; //std::std::atomic_thread_fence(std::memory_order_seq_cst); // this would do the trick if (arm2.load()) triggered = true; Thread2: arm2.store(true); if (arm1) triggered = true; I expected that after executing both 'triggered' would be true. Please don't suggest to make arm1 atomic, the point is to explore the