For something simple like a counter if multiple threads will be increasing the number. I read that mutex locks can decrease efficiency since the threads have to wait. So, to
Most processors have supported an atomic read or write, and often an atomic cmp&swap. This means that the processor itself writes or reads the latest value in a single operation, and there might be a few cycles lost compared to a normal integer access, especially as the compiler can't optimise around atomic operations nearly as well as normal.
On the other hand a mutex is a number of lines of code to enter and leave, and during that execution other processors that access the same location are totally stalled, so clearly a big overhead on them. In unoptimised high-level code, the mutex enter/exit and the atomic will be function calls, but for mutex, any competing processor will be locked out while your mutex enter function returns, and while your exit function is started. For atomic, it is only the duration of the actual operation which is locked out. Optimisation should reduce that cost, but not all of it.
If you are trying to increment, then your modern processor probably supports atomic increment/decrement, which will be great.
If it does not, then it is either implemented using the processor atomic cmp&swap, or using a mutex.
Mutex:
get the lock
read
increment
write
release the lock
Atomic cmp&swap:
atomic read the value
calc the increment
do{
atomic cmpswap value, increment
recalc the increment
}while the cmp&swap did not see the expected value
So this second version has a loop [incase another processor increments the value between our atomic operations, so value no longer matches, and increment would be wrong] that can get long [if there are many competitors], but generally should still be quicker than the mutex version, but the mutex version may allow that processor to task switch.