Parallelization of CPU bound task continuing with IO bound

前端 未结 3 938
滥情空心
滥情空心 2021-01-20 02:20

I\'m trying to figure out a good way to do parallelization of code that does processing of big datasets and then imports the resulting data into RavenDb.

The data pr

3条回答
  •  一生所求
    2021-01-20 03:00

    For each batch you are starting a task. This means that your loop completes very quickly. It leaves (number of batches) tasks behind which is not what you wanted. You wanted (number of CPUs).

    Solution: Don't start a new task for each batch. The for loop is already parallel.

    In response to your comment, here is an improved version:

    //this runs in parallel
    var processedBatches = datasupplier.GetDataItems()
        .Partition(batchSize)
        .AsParallel()
        .WithDegreeOfParallelism(Environment.ProcessorCount)
        .Select(x => ProcessCpuBound(x));
    
    foreach (var batch in processedBatches) {
     PerformIOIntensiveWorkSingleThreadedly(batch); //this runs sequentially
    }
    

提交回复
热议问题