问题
I have an openmp parallel loop in C++ in which all the threads access a shared array of double.
- Each thread writes only in its own partition of the array. Two threads cannot write on the same array entry.
- Each thread read on partitions written by the other threads. It does not matter if the data has been updated by the thread who owns the partition or not, as soon as the double is either the old or the updated value (not an invalid value resulting from reading a half-written double).
Do I need atomic write to ensure that the read data is valid (either old or updated), or is atomic write needed only when several threads are trying to write at the same location?
It seems to work with and without atomic write, but of course faster without the atomic write.
回答1:
You should use atomic writes and reads for a correct portable program. As specified by the standard:
2.13.6 atomic Construct
[...] To avoid race conditions, all accesses of the locations designated by x that could potentially occur in parallel must be protected with an atomic construct.
In more detail:
1.4.1 Structure of the OpenMP Memory Model
[...]
A single access to a variable may be implemented with multiple load or store instructions, and hence is not guaranteed to be atomic with respect to other accesses to the same variable.
[...]
if at least one thread reads from a memory unit and at least one thread writes without synchronization to that same memory unit, including cases due to atomicity considerations as described above, then a data race occurs. If a data race occurs then the result of the program is unspecified.
In addition to atomicity, you should also consider the visibility:
1.4.3 The Flush Operation
The memory model has relaxed-consistency because a thread’s temporary view of memory is not required to be consistent with memory at all times. A value written to a variable can remain in the thread’s temporary view until it is forced to memory at a later time. Likewise, a read from a variable may retrieve the value from the thread’s temporary view, unless it is forced to read from memory. The OpenMP flush operation enforces consistency between the temporary view and memory.
That means, unless you have any explicit or implicit memory flush, then there is no guarantee that you will ever see the updated values.
However, by no means is the atomic version necessarily slower. The compiler is who implements the atomic operation is aware of the particular memory model of the architecture and free to exploit it. In fact neither gcc nor clang generate expensive locks for atomically writing or reading a double
on x86, while doing so for atomic increments or long double
operations. Unfortunately the atomic may still hinder certain optimizations - but those may very well lead to unspecified results if you omitted the atomic
. Do not underestimate compiler optimizations: It is very easy to get undefined behavior with apparently sensible programs that are strictly speaking not standard conforming.
As to the performance impact of the memory flushes, you it depends on your actual algorithm how frequently you need to flush the memory.
回答2:
I think it is safe in your case. The one issue I could see is where the writer thread only finishes writing half the double before the reader reads it leading to some corrupted value. For example (pretend you were using 64 bit integers), if the writer wrote the value of -1 where there was a zero previously, but only managed to write the first 4 bytes before the reader read it, then the reader would read 4 billion something which is way off since -1 is all ones in twos component representation.
Practically speaking on most modern architectures I do not think this is possible. If you have a 64 bit cpu, reads and write to memory should occur in 64 bit chunks. This means that the above corruption is not possible. Now if you were reading a writing a large struct to memory you many need to look into synchronization.
来源:https://stackoverflow.com/questions/41612143/is-openmp-atomic-write-needed-if-other-threads-read-only-the-shared-data