I\'ve been tasked with taking an existing single threaded monte carlo simulation and optimising it. This is a c# console app, no db access it l
List<double>
is definitely not thread-safe. See the section "thread safety" in the System.Collections.Generic.List documentation. The reason is performance: adding thread safety is not free.
Your random number implementation also isn't thread-safe; getting the same numbers multiple times is exactly what you'd expect in this case. Let's use the following simplified model of rnd.NextUniform()
to understand what is happening:
Now, if two threads execute this method in parallel, something like this may happen:
As you can see, any reasoning you can do to prove that rnd.NextUniform()
works is no longer valid because two threads are interfering with each other. Worse, bugs like this depend on timing and may appear only rarely as "glitches" under certain workloads or on certain systems. Debugging nightmare!
One possible solution is to eliminate the state sharing: give each task its own random number generator initialized with another seed (assuming that instances are not sharing state through static fields in some way).
Another (inferior) solution is to create a field holding a lock object in your MersenneTwister
class like this:
private object lockObject = new object();
Then use this lock in your MersenneTwister.NextUniform()
implementation:
public double NextUniform()
{
lock(lockObject)
{
// original code here
}
}
This will prevent two threads from executing the NextUniform() method in parallel. The problem with the list in your Parallel.For
can be addressed in a similar manner: separate the Simulate
call and the AddRange
call, and then add locking around the AddRange
call.
My recommendation: avoid sharing any mutable state (like the RNG state) between parallel tasks if at all possible. If no mutable state is shared, no threading issues occur. This also avoids locking bottlenecks: you don't want your "parallel" tasks to wait on a single random number generator that doesn't work in parallel at all. Especially if 30% of the time is spend acquiring random numbers.
Limit state sharing and locking to places where you can't avoid it, like when aggregating the results of parallel execution (as in your AddRange
calls).
Threading is going to be complicated. You will have to break your program into logical units that can each be run on their own threads, and you will have to deal with any concurrency issues that emerge.
The Parallel Extension Library should allow you to parallelize your program by changing some of your for loops to Parallel.For loops. If you want to see how this works, Anders Hejlsberg and Joe Duffy provide a good introduction in their 30 minute video here:
http://channel9.msdn.com/shows/Going+Deep/Programming-in-the-Age-of-Concurrency-Anders-Hejlsberg-and-Joe-Duffy-Concurrent-Programming-with/
Threading vs. ThreadPool
The ThreadPool, as its name implies, is a pool of threads. Using the ThreadPool to obtain your threads has some advantages. Thread pooling enables you to use threads more efficiently by providing your application with a pool of worker threads that are managed by the system.
First you need to understand why you think that using multiple threads is an optimization - when it is, in fact, not. Using multiple threads will make your workload complete faster only if you have multiple processors, and then at most as many times faster as you have CPUs available (this is called the speed-up). The work is not "optimized" in the traditional sense of the word (i.e. the amount of work isn't reduced - in fact, with multithreading, the total amount of work typically grows because of the threading overhead).
So in designing your application, you have to find pieces of work that can be done in a parallel or overlapping fashion. It may be possible to generate random numbers in parallel (by having multiple RNGs run on different CPUs), but that would also change the results, as you get different random numbers. Another option is have generation of the random numbers on one CPU, and everything else on different CPUs. This can give you a maximum speedup of 3, as the RNG will still run sequentially, and still take 30% of the load.
So if you go for this parallelization, you end up with 3 threads: thread 1 runs the RNG, thread 2 produces normal distribution, and thread 3 does the rest of the simulation.
For this architecture, a producer-consumer architecture is most appropriate. Each thread will read its input from a queue, and produce its output into another queue. Each queue should be blocking, so if the RNG thread falls behind, the normalization thread will automatically block until new random numbers are available. For efficiency, I would pass the random numbers in array of, say, 100 (or larger) across threads, to avoid synchronizations on every random number.
For this approach, you don't need any advanced threading. Just use regular thread class, no pool, no library. The only thing that you need that is (unfortunately) not in the standard library is a blocking Queue class (the Queue class in System.Collections is no good). Codeproject provides a reasonably-looking implementation of one; there are probably others.