I\'ve been experimenting with multi threading and parallel processing and I needed a counter to do some basic counting and statistic analysis of the speed of the processing.
Here is an article that goes into the cost. Short answer is 50ns.
The cost for a lock in a tight loop, compared to an alternative with no lock, is huge. You can afford to loop many times and still be more efficient than a lock. That is why lock free queues are so efficient.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LockPerformanceConsoleApplication
{
class Program
{
static void Main(string[] args)
{
var stopwatch = new Stopwatch();
const int LoopCount = (int) (100 * 1e6);
int counter = 0;
for (int repetition = 0; repetition < 5; repetition++)
{
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++)
lock (stopwatch)
counter = i;
stopwatch.Stop();
Console.WriteLine("With lock: {0}", stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++)
counter = i;
stopwatch.Stop();
Console.WriteLine("Without lock: {0}", stopwatch.ElapsedMilliseconds);
}
Console.ReadKey();
}
}
}
Output:
With lock: 2013
Without lock: 211
With lock: 2002
Without lock: 210
With lock: 1989
Without lock: 210
With lock: 1987
Without lock: 207
With lock: 1988
Without lock: 208
I would like to present few articles of mine, that are interested in general synchronization primitives and they are digging into Monitor, C# lock statement behavior, properties, and costs depending on distinct scenarios and number of threads. It is specifically interested about CPU wastage and throughput periods to understand how much work can be pushed through in multiple scenarios:
https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies https://www.codeproject.com/Articles/1242156/Unified-Concurrency-III-cross-benchmarking
Oh dear!
It seems that correct answer flagged here as THE ANSWER is inherently incorrect! I would like to ask the author of the answer, respectfully, to read the linked article to the end. article
The author of the article from 2003 article was measuring on Dual Core machine only and in the first measuring case, he measured locking with a single thread only and the result was about 50ns per lock access.
It says nothing about a lock in the concurrent environment. So we have to continue reading the article and in the second half, the author was measuring locking scenario with two and three threads, which gets closer to concurrency levels of today's processors.
So the author says, that with two threads on Dual Core, the locks cost 120ns, and with 3 threads it goes to 180ns. So it seems to be clearly dependent on the number of threads accessing the lock concurrently.
So it is simple, it is not 50 ns unless it is a single thread, where the lock gets useless.
Another issue for consideration is that it is measured as average time!
If the time of iterations would be measured, there would be even times between 1ms to 20ms, simply because the majority was fast, but few threads will be waiting for processors time and incur even milliseconds long delays.
This is bad news for any kind of application which requires high throughput, low latency.
And the last issue for consideration is that there could be slower operations inside the lock and very often that is the case. The longer the block of code is executed inside the lock, the higher the contention is and delays rise sky high.
Please consider, that over one decade has passed already from 2003, that is few generations of processors designed specifically to run fully concurrently and locking is considerably harming their performance.
The technical answer is that this is impossible to quantify, it heavily depends on the state of the CPU memory write-back buffers and how much data that the prefetcher gathered has to be discarded and re-read. Which are both very non-deterministic. I use 150 CPU cycles as a back-of-the-envelope approximation that avoids major disappointments.
The practical answer is that it is waaaay cheaper than the amount of time you'll burn on debugging your code when you think you can skip a lock.
To get a hard number you'll have to measure. Visual Studio has a slick concurrency analyzer available as an extension.
lock
(Monitor.Enter/Exit) is very cheap, cheaper than alternatives like a Waithandle or Mutex.
But what if it was (a little) slow, would you rather have a fast program with incorrect results?
There are a few different ways to define "cost". There is the actual overhead of obtaining and releasing the lock; as Jake writes, that's negligible unless this operation is performed millions of times.
Of more relevance is the effect this has on the flow of execution. This code can only be entered by one thread at a time. If you have 5 threads performing this operation on a regular basis, 4 of them will end up waiting for the lock to be released, and then to be the first thread scheduled to enter that piece of code after that lock is released. So, your algorithm is going to suffer significantly. How much so depends on the algorithm and how often the operation is called.. You can't really avoid it without introducing race conditions, but you can ameliorate it by minimizing the number of calls to the locked code.