I\'m trying to figure out a good way to do parallelization of code that does processing of big datasets and then imports the resulting data into RavenDb.
The data pr
For each batch you are starting a task. This means that your loop completes very quickly. It leaves (number of batches) tasks behind which is not what you wanted. You wanted (number of CPUs).
Solution: Don't start a new task for each batch. The for loop is already parallel.
In response to your comment, here is an improved version:
//this runs in parallel
var processedBatches = datasupplier.GetDataItems()
.Partition(batchSize)
.AsParallel()
.WithDegreeOfParallelism(Environment.ProcessorCount)
.Select(x => ProcessCpuBound(x));
foreach (var batch in processedBatches) {
PerformIOIntensiveWorkSingleThreadedly(batch); //this runs sequentially
}