Continuing the discussion from Understanding VS2010 C# parallel profiling results but more to the point:
I have many threads that work in parallel (using Parallel.Fo
I searched for two days trying to find an answer to the same issue you had. The answer is you need to set the garbage collection mode to Server mode. By default, garbage collection mode set to Workstation mode. Setting garbage collection to Server mode causes the managed heap to split into separately managed sections, one-per CPU. To do this, you need to add a config setting to your app.config file.
<runtime>
<gcServer enabled="true"/>
</runtime>
The speed difference on my 12-core Opteron 6172 was dramatic!
You could pre-allocate a bunch of objects, and keep them in groups intended for separate threads. However, it's likely that you won't get any better performance from this.
The garbage collector is specially designed to handle small short-lived objects efficiently. If you keep the objects in a pool, they are long-lived and will survive a garbage collection, which in turns means that they will be copied to the second generation heap. This copying will be more expensive than just allocating new objects.
The garbage collector does not allocate memory.
It sounds more like you're allocating lots of small temporary objects and a few long-lived objects, and the garbage collector is spending a lot of time garbage-collecting the temporary objects so your app doesn't have to request more memory from the OS. From .NET Framework 4 Advanced Development - Garbage Collection:
As long as address space is available in the managed heap, the runtime continues to allocate space for new objects. However, memory is not infinite. Eventually the garbage collector must perform a collection in order to free some memory.
The solution: Don't allocate lots of small temporary objects. The page on Garbage Collection and Performance might also be helpful.