问题
I need to create a program that processes a huge amount of images. There are about 10 different stages in the process which need to happen sequentially.
I wanted to ask if it is better to create a pipeline where each processing stage has its own thread and buffers in between using the pipeline pattern described here: https://msdn.microsoft.com/en-us/library/ff963548.aspx
or create a thread pool and assign one image to one thread by just using Parallel.Foreach.
And why?
回答1:
Maybe this will be something new for you, but you can use actor system getakka.net where all jobs are done by actors, so you can create actor for each stage and pass image from one actor to another actor. I am using this framework as there is a huge improvement in parallel processing in non-blocking way. Also it can be scaled easily.
回答2:
Honestly, there really is no way to tell without actually benchmarking it. However you actually may be able to both parallel and a pipeline at the same time using TPL Dataflow.
Each stage in the pipeline would be a TransformBlock<TInput, TOutput> then the stages that could be processed in parallel can have its Degree of Parallelism set.
Here is an example (written in browser so may have errors), it loads images with a 3 stage pipeline for reading from the disk, cropping an image, then writing it back to the disk. The read and write phase only do 1 image at a time but the crop phase will process 5 images concurrently. Also the pipeline only lets 100 images be queued out to write and 100 more images to be queued out to be cropped. If the pipeline gets full it will stop reading in images and wait till there is room in the pipeline (preventing overuse of RAM for images).
public async Task CropImages(string directory, int x, int y)
{
var loadImage = new TransformBlock<String, MyImage>(LoadImageAsync);
var cropImage = new TransformBlock<MyImage, MyImage>((image) => Crop(image, x, y),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 5});
var saveImage = new ActionBlock(SaveImageAsync);
loadImage.LinkTo(cropImage, new DataflowLinkOptions {PropagateCompletion = true, MaxMessages = 100});
cropImage.LinkTo(saveImage, new DataflowLinkOptions {PropagateCompletion = true, MaxMessages = 100});
foreach(var file in Directory.EnumerateFiles(directory, "*.jpg"))
{
await loadImage.SendAsync(file);
}
loadImage.Complete();
await saveImage.Completion;
}
private async Task<MyImage> LoadImageAsync(string fileName)
{
byte[] data = await GetDataAsync(fileName);
return new MyImage(data, fileName);
}
private MyImage Crop(MyImage image, int x, int y)
{
image.Crop(x,y);
return image;
}
private async Task SaveImageAsync(MyImage image)
{
var fileName = Path.GetFileName(image.FileName);
var directoryName = Path.GetDirectoryName(image.FileName);
var newName = Path.Combine(directoryName, "Cropped-" + filename);
await SaveDataAsync(image.Bytes, newName);
}
来源:https://stackoverflow.com/questions/35558923/heavy-processing-stage-or-loop-thread