I\'m trying to figure out a good way to do parallelization of code that does processing of big datasets and then imports the resulting data into RavenDb.
The data pr
Your task overall sounds like a producer-consumer workflow. Your batch processors are producers, and your RavenDB data "import" are the consumers of the output of the producers.
Consider using a BlockingCollection<T> as the connection between your batch proccesors and your db importers. The db importers will wake up as soon as the batch processors push completed batches into the blocking collection, and will go back to sleep when they have "caught up" and emptied the collection.
The batch processor producers can run full throttle and will always be running concurrent with the db importer tasks processing previously completed batches. If you are concerned that the batch processors may get too far ahead of the db importers (b/c db import takes significantly longer than processing each batch) you can set an upper bound on the blocking collection so that the producers will block when they add beyond that limit, giving the consumers a chance to catch up.
Some of your comments are worrisome, though. There's nothing particularly wrong with spinning up a Task instance to perform the db import asynchronously to the batch processing. Task != Thread. Creating new task instances does not have the same monumental overhead of creating new threads.
Don't get hung up on trying to control threads too precisely. Even if you specify that you want exactly as many buckets as you have cores, you don't get exclusive use of those cores. Hundreds of other threads from other processes will still be scheduled in between your time slices. Specify the logical units of work using Tasks and let the TPL manage the thread pool. Save yourself the frustration of a false sense of control. ;>
In your comments, you indicate that your tasks do not appear to be running async to each other (how are you determining this?) and memory does not appear to be released after each batch is finished. I'd suggest dropping everything until you can figure out what is up with those two problems first. Are you forgetting to call Dispose() somewhere? Are you holding onto a reference that is keeping a whole tree of objects alive unnecessarily? Are you measuring the right thing? Are the parallel tasks being serialized by a blocking database or network I/O? Until these two issues are resolved it doesn't matter what your parallelism plan is.
I recently built something similar, I used the Queue class vs List with the Parallel.Foreach. I found that too many threads actually slowed things down, there is a sweet spot.
For each batch you are starting a task. This means that your loop completes very quickly. It leaves (number of batches) tasks behind which is not what you wanted. You wanted (number of CPUs).
Solution: Don't start a new task for each batch. The for loop is already parallel.
In response to your comment, here is an improved version:
//this runs in parallel
var processedBatches = datasupplier.GetDataItems()
.Partition(batchSize)
.AsParallel()
.WithDegreeOfParallelism(Environment.ProcessorCount)
.Select(x => ProcessCpuBound(x));
foreach (var batch in processedBatches) {
PerformIOIntensiveWorkSingleThreadedly(batch); //this runs sequentially
}