问题
This is with Vulkan semantics, if it makes any difference.
Assume the following:
layout(...) coherent buffer B
{
uint field;
} b;
Say the field is being modified by other invocations of the same shader (or a derived shader) through atomic*()
funcions.
If a shader invocation wants to perform an atomic read from this field
(with the same semantics as atomicCounter()
in GLES, had this been an atomic_uint
instead), is there any difference between the following two (other than obviously that one of them does a write as well as read)?
uint read_value = b.field;
uint read_value2 = atomicAdd(b.field, 0);
回答1:
To directly answer the question, those two lines of code generate different instructions, with differing performance characteristics and hardware pipeline usage.
uint read_value = b.field; // generates a load instruction
uint read_value2 = atomicAdd(b.field, 0); // generates an atomic instruction
- AMD disassembly can be seen in this online Shader Playground --
buffer_load_dword
versusbuffer_atomic_add
- Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking --
LDG
versusATOM
The GLSL spec section 4.10 Memory Qualifiers makes a point that coherent
is only about visibility of reads and writes across invocations (shader threads). They also left a comment on the implied performance:
When accessing memory using variables not declared as coherent, the memory accessed by a shader may be cached by the implementation to service future accesses to the same address. Memory stores may be cached in such a way that the values written might not be visible to other shader invocations accessing the same memory. The implementation may cache the values fetched by memory reads and return the same values to any shader invocation accessing the same memory, even if the underlying memory has been modified since the first memory read. While variables not declared as coherent might not be useful for communicating between shader invocations, using non-coherent accesses may result in higher performance.
The point-of-coherence in GPU memory systems is usually the last-level cache (L2 cache), meaning all coherent accesses must be performed by the L2 cache. This also means coherent buffers cannot be cached in L1 or other caches closer to the shader processors. Modern GPUs also have dedicated atomic hardware in the L2 caches; a plain load will not use those, but an atomicAdd(..., 0)
will go through those. The atomic hardware usually has lower bandwidth than the full L2 cache.
回答2:
SPIR-V has an OpAtomicLoad instruction. Presumably, there is at least one piece of hardware in which non-atomic loads cannot replace an atomic load no matter what qualifier the buffer descriptor has.
Unfortunately, there is no Vulkan GLSL construct that can translate to OpAtomicLoad
that I'm aware of.
来源:https://stackoverflow.com/questions/57114620/if-a-buffer-is-coherent-is-there-any-difference-between-reading-a-field-or