问题
I have a question about implementing pipeline using Dataflow TPL library.
My case is that I have a software that needs to process some tasks concurrently. Processing looks like this: first we process album at global level, and then we go inside album and process each picture individually. Let's say that application has got processing slots and they are configurable (for the sake of example assume slots = 2). This means that application can process either:
a) two albums on the same time
b) one album + one photo from different album
c) two photos on the same time for same album
d) two photos on the same time for different albums
Currently I implemented process like this:
var albumTransferBlock = new TransformBlock<Album, Album>(ProcessAlbum,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 2 });
ActionBlock<Album> photoActionBlock = new ActionBlock<Album>(ProcessPhoto);
albumTransferBlock.LinkTo(photoActionBlock);
Album ProcessAlbum(Album a)
{
return a;
}
void ProcessPhoto(Album album)
{
foreach (var photo in album)
{
// do some processing
}
}
The problem I have is that when I process 1 album at the time, application will never use two slots for processing photos. It meets all requirement except c)
Can anyone help me to solve this issue using DataFlow TPL?
回答1:
I think I can answer myself. What I did is:
1) I created an interface IProcessor with method Process() 2) wrapped AlbumProcessing and PhotoProcessing with interface IProcessor 3) Created one ActionBlock that takes IProcessor as Input and executes Process method.
4) At the end of processing Album I am adding processing of all photos to ActionBlock.
This fulfills my requiremens in 100%. Maybe someone has some other solution?
回答2:
You could use a TransformManyBlock for processing the albums, linked to an ActionBlock
for processing the photos, so that each album is processed before its photos are processed. For imposing a concurrency limitation that exceeds the boundaries of a single block, you could use either a limited-concurrency TaskScheduler
or a SemaphoreSlim
. The second option is more flexible since it allows to throttle asynchronous operations as well. In your case all the operations are synchronous, so you are free to choose either approach. In both cases you should still configure the MaxDegreeOfParallelism
option of the blocks to the desirable maximum concurrency limit, otherwise —if you make them unbounded— the order of processing will become too random.
Here is an example of the TaskScheduler
approach. It uses the ConcurrentScheduler
property of the ConcurrentExclusiveSchedulerPair class:
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2,
TaskScheduler = new ConcurrentExclusiveSchedulerPair(TaskScheduler.Default,
maxConcurrencyLevel: 2).ConcurrentScheduler
};
var albumsBlock = new TransformManyBlock<Album, Photo>(album =>
{
ProcessAlbum(album);
return album.Photos;
}, options);
var photosBlock = new ActionBlock<Photo>(photo =>
{
ProcessPhoto(photo);
}, options);
albumsBlock.LinkTo(photosBlock);
And here is an example of the SemaphoreSlim
approach. Using the WaitAsync method instead of the Wait
has the advantage that the awaiting for acquiring the semaphore will happen asynchronously, so no ThreadPool
threads are going to be needlessly blocked:
var throttler = new SemaphoreSlim(2);
var albumsBlock = new TransformManyBlock<Album, Photo>(async album =>
{
await throttler.WaitAsync();
try
{
ProcessAlbum(album);
return album.Photos;
}
finally { throttler.Release(); }
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 2 });
var photosBlock = new ActionBlock<Photo>(async photo =>
{
await throttler.WaitAsync();
try
{
ProcessPhoto(photo);
}
finally { throttler.Release(); }
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 2 });
albumsBlock.LinkTo(photosBlock);
来源:https://stackoverflow.com/questions/38400875/dataflow-tpl-implementing-pipeline-with-precondition