I made a very simple spinlock using the Interlocked functions in Windows and tested it on a dual-core CPU (two threads that increment a variable);
The program seems
Parallel Inspector's documentation for data race suggests using a critical section or a mutex to fix races on Windows. There's nothing in it which suggests that Parallel Inspector knows how to recognise any other locking mechanism you might invent.
Tools for analysis of novel locking mechanisms tend to be static tools which look at every possible path through the code, Parallel Inspector's documentation implies that it executes the code once.
If you want to experiment with novel locking mechanisms, the most common tool I've seen used in academic literature is the Spin model checker. There's also ESP, which might reduce the state space, but I don't know if it's been applied to concurrent problems, and also the mobility workbench which would give an analysis if you can couch your problem in pi-calculus. Intel Parallel Inspector doesn't seem anything like as complicated as these tools, but rather designed to check for commonly occurring issues using heuristics.
For other poor folks in a similar situation to me: Intel DOES provide a set of includes and libraries for doing exactly this sort of thing. Check in the Inspector installation directory (you'll see \include, \lib32 and \lib64 in the installation directory) for those materials. Documentation on how to use them (as of June 2018, though Intel cares nothing about keeping links consistent):
https://software.intel.com/en-us/inspector-user-guide-windows-apis-for-custom-synchronization
There are 3 functions:
void __itt_sync_acquired(void *addr)
void __itt_sync_releasing(void *addr)
void __itt_sync_destroy(void *addr)
I'm pretty sure it should be implemented as follows:
class SpinLock
{
long lockValue;
SpinLock(long value) : lockValue(value) { }
void Lock() {
while(InterlockedCompareExchange(&lockValue, 1, 0) != 0) {
WaitABit();
}
}
void Unlock() { InterlockedExchange(&lockValue, 0); }
};