I\'m trying to figure out is there a bug in the answer (now deleted) about the implementation of Cuda-like atomicCAS
for bool
s. The code from the answe
Many many thanks to @RobertCrovella; the first code sample does contain a bug, the second does fix it, but is not thread-safe (see question comments for details). The thread-safe fix:
static __inline__ __device__ bool atomicCAS(bool *address, bool compare, bool val)
{
unsigned long long addr = (unsigned long long)address;
unsigned pos = addr & 3; // byte position within the int
int *int_addr = (int *)(addr - pos); // int-aligned address
int old = *int_addr, assumed, ival;
bool current_value;
do
{
current_value = (bool)(old & ((0xFFU) << (8 * pos)));
if(current_value != compare) // If we expected that bool to be different, then
break; // stop trying to update it and just return it's current value
assumed = old;
if(val)
ival = old | (1 << (8 * pos));
else
ival = old & (~((0xFFU) << (8 * pos)));
old = atomicCAS(int_addr, assumed, ival);
} while(assumed != old);
return current_value;
}