问题
I have a few different ways of upload entire directories to Amazon S3 within my application depending on what options are selected. Currently one of the options will perform an upload of multiple directories in parallel. I'm not sure if this is a good idea as in some cases it sped up the upload and other cases it slowed it down. The speed up appears to be when there are a bunch of small directories, but it slows down if there are large directories in the batch. I'm using the parallel ForEach loop seen below and utilizing the AWS API's TransferUtility.UploadDirectoryAsync()
method as such:
Parallel.ForEach(dirs,myParallelOptions,
async dir => { await MyUploadMethodAsync(dir) };
Where the TransferUtility.UploadDirectoryAsync()
method is within MyUploadMethodAsync()
. The TransferUtility
's upload methods all perform parallel uploads of parts a single file (if the size is big enough to do so), so performing a parallel upload of the directory as well may be overkill. Obviously we are still limited to the amount of bandwidth available so this might be a waste and I just should just use a regular foreach loop with the UploadDirectoryAsync()
method. Can anyone provide some insight on if this is bad case for parallelization?
回答1:
Did you actually test this? The way you're using it, Parallel.ForEach
may return well before any of MyUploadMethodAsync
is completed, because of the async
lambda:
Parallel.ForEach(dirs,myParallelOptions,
async dir => { await MyUploadMethodAsync(dir) };
Parallel.ForEach
is suited for CPU-bound tasks. For IO-bound tasks, you are probably looking for something like this:
var tasks = dirs.Select(dir => MyUploadMethodAsync(dir));
await Task.WhenAll(tasks);
// or Task.WaitAll(tasks) if you need a blocking wait
来源:https://stackoverflow.com/questions/21270523/parallelizing-io-bound-network-foreach-loop