Parallelizing IO Bound (Network) ForEach Loop

梦想与她 提交于 2019-12-22 07:59:30

问题


I have a few different ways of upload entire directories to Amazon S3 within my application depending on what options are selected. Currently one of the options will perform an upload of multiple directories in parallel. I'm not sure if this is a good idea as in some cases it sped up the upload and other cases it slowed it down. The speed up appears to be when there are a bunch of small directories, but it slows down if there are large directories in the batch. I'm using the parallel ForEach loop seen below and utilizing the AWS API's TransferUtility.UploadDirectoryAsync() method as such:

Parallel.ForEach(dirs,myParallelOptions, 
                   async dir => { await MyUploadMethodAsync(dir) };

Where the TransferUtility.UploadDirectoryAsync() method is within MyUploadMethodAsync(). The TransferUtility's upload methods all perform parallel uploads of parts a single file (if the size is big enough to do so), so performing a parallel upload of the directory as well may be overkill. Obviously we are still limited to the amount of bandwidth available so this might be a waste and I just should just use a regular foreach loop with the UploadDirectoryAsync() method. Can anyone provide some insight on if this is bad case for parallelization?


回答1:


Did you actually test this? The way you're using it, Parallel.ForEach may return well before any of MyUploadMethodAsync is completed, because of the async lambda:

Parallel.ForEach(dirs,myParallelOptions, 
    async dir => { await MyUploadMethodAsync(dir) };

Parallel.ForEach is suited for CPU-bound tasks. For IO-bound tasks, you are probably looking for something like this:

var tasks = dirs.Select(dir => MyUploadMethodAsync(dir));
await Task.WhenAll(tasks);
// or Task.WaitAll(tasks) if you need a blocking wait


来源:https://stackoverflow.com/questions/21270523/parallelizing-io-bound-network-foreach-loop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!