I read somewhere (can\'t find the page anymore) that lock free data structures are more efficient \"for certain workloads\" which seems to imply that sometimes they\'re actually
Efficiency depends on the metric. Lock-, or wait-free algorithms are important in systems where preemption can introduce deadlock or affect scheduling deadlines. In those cases, processing is less important than correctness.
The OP considers locking as an alternative to mutexes. Some algorithms require neither to access a shared data structure. In these cases, both producer and consumer can access the same data structure concurrently without regard for the other. An example of a shared queue permits a single reader and a single writer to simultaneously act on a shared instance. This meets the common need of a device driver writing data that a consumer process can access on demand.
More complex relationships between processes can be permitted (see Herlihy (1991) for an analysis) with varying levels of hardware support. He concludes Wait-free synchronization represents a qualitative break with the traditional locking-based techniques for implementing concurrent objects.
What it means is that there remains a trade-off, but that it is not one simply between choosing between mutexes and spinlocks.
A rule of thumb remains to focus on correctness rather than performance. Performance can usually be achieved by throwing money at the problem, while meeting requirements is usually more difficult.