TPL Dataflow: Why does EnsureOrdered = false destroy parallelism for this TransformManyBlock?

元气小坏坏 提交于 2020-06-27 13:17:15

问题


I'm working on a TPL Dataflow pipeline and noticed some strange behaviour related to ordering/parallelism in TransformManyBlocks (might apply to other blocks as well).

Here is my code to reproduce (.NET 4.7.2, TPL Dataflow 4.9.0):

class Program
{
    static void Main(string[] args)
    {
        var sourceBlock = new TransformManyBlock<int, Tuple<int, int>>(i => Source(i),
            new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, EnsureOrdered = false });

        var targetBlock = new ActionBlock<Tuple<int, int>>(tpl =>
        {
            Console.WriteLine($"Received ({tpl.Item1}, {tpl.Item2})");
        },
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4, EnsureOrdered = true });

        sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions { PropagateCompletion = true });

        for (int i = 0; i < 10; i++)
        {
            sourceBlock.Post(i);
        }

        sourceBlock.Complete();
        targetBlock.Completion.Wait();
        Console.WriteLine("Finished");
        Console.Read();
    }

    static IEnumerable<Tuple<int, int>> Source(int i)
    {
        var rand = new Random(543543254);
        for (int j = 0; j < i; j++)
        {
            Thread.Sleep(rand.Next(100, 1500));
            Console.WriteLine($"Returning ({i}, {j})");
            yield return Tuple.Create(i, j);
        }
    }
}

My desired behaviour is the following:

  • The source block should return tuples in parallel, the only requirement is that they should be ordered by the secondary property j.
  • The target block should process messages in the order received.

From what I understand, the secondary ordering condition is satisfied by the nature of yield return, so EnsureOrdered can be set to false. If this is set to true, the source block will withhold messages for an unacceptable amount of time since it will wait for all yield return to complete before passing the message along (in the real app many GB of data is processed which means that we want to propagate data through the pipeline as quickly as possible so we can release RAM). This is a sample output when EnsureOrdered of the source block is set to true:

Returning (1, 0)
Returning (2, 0)
Returning (4, 0)
Returning (3, 0)
Returning (2, 1)
Returning (4, 1)
Returning (3, 1)
Received (1, 0)
Received (2, 0)
Received (2, 1)
Returning (4, 2)
Returning (3, 2)
Received (3, 0)
Received (3, 1)
Received (3, 2)
Returning (5, 0)
Returning (6, 0)

We can see that the source block works in parallel, but waits to propagate messages until all the messages for the next i in line has been generated (as expected).

However when EnsureOrdered for the source block is false (as in the code sample), I get the following output:

Returning (2, 0)
Received (2, 0)
Returning (2, 1)
Received (2, 1)
Returning (4, 0)
Received (4, 0)
Returning (4, 1)
Received (4, 1)
Returning (4, 2)
Received (4, 2)
Returning (4, 3)
Received (4, 3)
Returning (1, 0)
Received (1, 0)
Returning (3, 0)
Received (3, 0)
Returning (3, 1)
Received (3, 1)
Returning (3, 2)
Received (3, 2)

The source block successfully propagates messages when available, however it seems like parallelism is lost since it is just working with one i at a time.

Why is this? How can I force it to process in parallel?


回答1:


A fix for this is in progress here: https://github.com/dotnet/corefx/pull/31059

Thank you for your report!



来源:https://stackoverflow.com/questions/51276432/tpl-dataflow-why-does-ensureordered-false-destroy-parallelism-for-this-transf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!