I have a chain of TPL Dataflow blocks and would like to observe progress somewhere inside the system.
I am aware that I could just jam a TransformBlock
Try replacing:
obs.ForEachAsync(i => Debug.Print("progressBlock:" + i.ToString()));
with:
obs.Subscribe(i => Debug.Print("progressBlock:" + i.ToString()));
I'd imagine the ForEachAsync
method isn't hooking in properly / it's firing, but something funky is going on with the async portion.
There are two options to consider when creating an observable dataflow block. You can either:
Both options have pros and cons. The first option provides timely but unordered notifications. The second option provides ordered but delayed notifications, and also must deal with the disposability of the block-to-block linking. What should happen with the observable, when the link between the two blocks is manually disposed before the blocks are completed?
Below is an implementation of the first option, that creates a TransformBlock
together with a non-consuming IObservable
of this block. There is also an implementation for an ActionBlock
equivalent, based on the first implementation (although it could also be implemented independently by copy-pasting and adapting the TransformBlock
implementation, since the code is not that much).
public static TransformBlock<TInput, TOutput>
CreateObservableTransformBlock<TInput, TOutput>(
Func<TInput, Task<TOutput>> transform,
out IObservable<(TInput Input, TOutput Output,
int StartedIndex, int CompletedIndex)> observable,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
if (transform == null) throw new ArgumentNullException(nameof(transform));
dataflowBlockOptions = dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
var semaphore = new SemaphoreSlim(1);
int startedIndexSeed = 0;
int completedIndexSeed = 0;
var notificationsBlock = new BufferBlock<(TInput, TOutput, int, int)>(
new DataflowBlockOptions() { BoundedCapacity = 100 });
var transformBlock = new TransformBlock<TInput, TOutput>(async item =>
{
var startedIndex = Interlocked.Increment(ref startedIndexSeed);
var result = await transform(item).ConfigureAwait(false);
await semaphore.WaitAsync().ConfigureAwait(false);
try
{
// Send the notifications in synchronized fashion
var completedIndex = Interlocked.Increment(ref completedIndexSeed);
await notificationsBlock.SendAsync(
(item, result, startedIndex, completedIndex)).ConfigureAwait(false);
}
finally
{
semaphore.Release();
}
return result;
}, dataflowBlockOptions);
_ = transformBlock.Completion.ContinueWith(t =>
{
if (t.IsFaulted) ((IDataflowBlock)notificationsBlock).Fault(t.Exception);
else notificationsBlock.Complete();
}, TaskScheduler.Default);
observable = notificationsBlock.AsObservable();
// A dummy subscription to prevent buffering in case of no external subscription.
observable.Subscribe(
DataflowBlock.NullTarget<(TInput, TOutput, int, int)>().AsObserver());
return transformBlock;
}
// Overload with synchronous lambda
public static TransformBlock<TInput, TOutput>
CreateObservableTransformBlock<TInput, TOutput>(
Func<TInput, TOutput> transform,
out IObservable<(TInput Input, TOutput Output,
int StartedIndex, int CompletedIndex)> observable,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
return CreateObservableTransformBlock(item => Task.FromResult(transform(item)),
out observable, dataflowBlockOptions);
}
// ActionBlock equivalent (requires the System.Reactive package)
public static ITargetBlock<TInput>
CreateObservableActionBlock<TInput>(
Func<TInput, Task> action,
out IObservable<(TInput Input, int StartedIndex, int CompletedIndex)> observable,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
if (action == null) throw new ArgumentNullException(nameof(action));
var block = CreateObservableTransformBlock<TInput, object>(
async item => { await action(item).ConfigureAwait(false); return null; },
out var sourceObservable, dataflowBlockOptions);
block.LinkTo(DataflowBlock.NullTarget<object>());
observable = sourceObservable
.Select(entry => (entry.Input, entry.StartedIndex, entry.CompletedIndex));
return block;
}
// ActionBlock equivalent with synchronous lambda
public static ITargetBlock<TInput>
CreateObservableActionBlock<TInput>(
Action<TInput> action,
out IObservable<(TInput Input, int StartedIndex, int CompletedIndex)> observable,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
return CreateObservableActionBlock(
item => { action(item); return Task.CompletedTask; },
out observable, dataflowBlockOptions);
}
Usage example in Windows Forms:
private async void Button1_Click(object sender, EventArgs e)
{
var block = CreateObservableTransformBlock((int i) => i + 20,
out var observable,
new ExecutionDataflowBlockOptions() { BoundedCapacity = 1 });
var vals = Enumerable.Range(1, 20).ToList();
TextBox1.Clear();
ProgressBar1.Value = 0;
observable.ObserveOn(SynchronizationContext.Current).Subscribe(onNext: x =>
{
TextBox1.AppendText($"Value {x.Input} transformed to {x.Output}\r\n");
ProgressBar1.Value = (x.CompletedIndex * 100) / vals.Count;
}, onError: ex =>
{
TextBox1.AppendText($"An exception occured: {ex.Message}\r\n");
},
onCompleted: () =>
{
TextBox1.AppendText("The job completed successfully\r\n");
});
block.LinkTo(DataflowBlock.NullTarget<int>());
foreach (var i in vals) await block.SendAsync(i);
block.Complete();
}
In the above example the type of the observable
variable is:
IObservable<(int Input, int Output, int StartedIndex, int CompletedIndex)>
The two indices are 1-based.
By specifying the BoundedCapacity
for the block inside the chain you creating a situation where some of your messages are rejected by target blocks, as the buffer for ActionBlock
is full, and message is being rejected.
With creating the observable from your buffer block you do provide a race condition: there are two consumers of your data getting messages simultaneously. Blocks in TPL Dataflow
are propagating data to the first available consumer, which leads you to indeterministic state of an application.
Now, back to your problem. You can introduce a BroadcastBlock
as it provides a copy of data to all the consumers, not the only first one, but in that case you have to remove the buffer size limitation, broadcast block is like a TV channel, you cannot get previous show, you only have a current one.
Side notes: you do not check the return value of Post
method, you may consider the await SendAsync
usage, and for better throttling effect set the BoundedCapacity
for the starting point block, not for the middle one.
The issue with your code is that you're wiring up two consumers of block1
. Dataflow is then just giving a value to which ever consumer is there first.
So you need to broadcast the values from block1
into two other blocks to then be able to consume those independently.
Just a side note, don't do .Publish().RefCount()
as it doesn't do what you think. It will effectively make a one run only observable that during that one run will allow multiple observers to connect and see the same values. It has nothing to do with the source of the data nor how the Dataflow blocks interact.
Try this code:
// Set up mesh
var block1 = new TransformBlock<int, int>(i => i + 20);
var block_boadcast = new BroadcastBlock<int>(i => i, new DataflowBlockOptions());
var block_buffer = new System.Threading.Tasks.Dataflow.BufferBlock<int>();
var block2 = new ActionBlock<int>(i => Debug.Print("block2:" + i.ToString()));
var obs = block_buffer.AsObservable();
var l1 = block1.LinkTo(block_boadcast);
var l2 = block_boadcast.LinkTo(block2);
var l3 = block_boadcast.LinkTo(block_buffer);
// Progress
obs.Subscribe(i => Debug.Print("progress:" + i.ToString()));
// Start
var vals = Enumerable.Range(1, 5);
foreach (var v in vals)
{
block1.Post(v);
}
block1.Complete();
That gives me:
block2:21 block2:22 block2:23 block2:24 block2:25 progress:21 progress:22 progress:23 progress:24 progress:25
Which is what I think you wanted.
Now, just as a further aside, using Rx for this might be a better option all around. It's much more powerful and declarative than any TPL or Dataflow option.
Your code boils down to this:
Observable
.Range(1, 5)
.Select(i => i + 20)
.Do(i => Debug.Print("progress:" + i.ToString()));
.Subscribe(i => Debug.Print("block2:" + i.ToString()));
That pretty much gives you same result.