Data Propagation in TPL Dataflow Pipeline with Batchblock.Triggerbatch()

后端 未结 2 1842
一个人的身影
一个人的身影 2021-01-24 19:32

In my Producer-Consumer scenario, I have multiple consumers, and each of the consumers send an action to external hardware, which may take some time. My Pipeline looks somewhat

相关标签:
2条回答
  • 2021-01-24 19:56

    I have found that using TriggerBatch in this way is unreliable:

        _groupReadTags.Post(10);
        _groupReadTags.Post(20);
        _groupReadTags.TriggerBatch();
    

    Apparently TriggerBatch is intended to be used inside the block, not outside it like this. I have seen this result in odd timing issues, like items from next batch batch being included in the current batch, even though TriggerBatch was called first.

    Please see my answer to this question for an alternative using DataflowBlock.Encapsulate: BatchBlock produces batch with elements sent after TriggerBatch()

    0 讨论(0)
  • 2021-01-24 19:59

    Here is an alternative BatchBlock implementation with some extra features. It includes a TriggerBatch method with this signature:

    public int TriggerBatch(int nextMinBatchSizeIfEmpty);
    

    Invoking this method will either trigger a batch immediately if the input queue is not empty, otherwise it will set a temporary MinBatchSize that will affect only the next batch. You could invoke this method with a small value for nextMinBatchSizeIfEmpty to ensure that in case a batch cannot be currently produced, the next batch will occur sooner than the configured BatchSize at the block's constructor.

    This method returns the size of the batch produced. It returns 0 in case that the input queue is empty, or the output queue is full, or the block has completed.

    public class BatchBlockEx<T> : ITargetBlock<T>, ISourceBlock<T[]>
    {
        private readonly ITargetBlock<T> _input;
        private readonly IPropagatorBlock<T[], T[]> _output;
        private readonly Queue<T> _queue;
        private readonly object _locker = new object();
        private int _nextMinBatchSize = Int32.MaxValue;
    
        public Task Completion { get; }
        public int InputCount { get { lock (_locker) return _queue.Count; } }
        public int OutputCount => ((BufferBlock<T[]>)_output).Count;
        public int BatchSize { get; }
    
        public BatchBlockEx(int batchSize, DataflowBlockOptions dataflowBlockOptions = null)
        {
            if (batchSize < 1) throw new ArgumentOutOfRangeException(nameof(batchSize));
            dataflowBlockOptions = dataflowBlockOptions ?? new DataflowBlockOptions();
            if (dataflowBlockOptions.BoundedCapacity != DataflowBlockOptions.Unbounded &&
                dataflowBlockOptions.BoundedCapacity < batchSize)
                throw new ArgumentOutOfRangeException(nameof(batchSize),
                "Number must be no greater than the value specified in BoundedCapacity.");
    
            this.BatchSize = batchSize;
    
            _output = new BufferBlock<T[]>(dataflowBlockOptions);
    
            _queue = new Queue<T>(batchSize);
    
            _input = new ActionBlock<T>(async item =>
            {
                T[] batch = null;
                lock (_locker)
                {
                    _queue.Enqueue(item);
                    if (_queue.Count == batchSize || _queue.Count >= _nextMinBatchSize)
                    {
                        batch = _queue.ToArray(); _queue.Clear();
                        _nextMinBatchSize = Int32.MaxValue;
                    }
                }
                if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
    
            }, new ExecutionDataflowBlockOptions()
            {
                BoundedCapacity = 1,
                CancellationToken = dataflowBlockOptions.CancellationToken
            });
    
            var inputContinuation = _input.Completion.ContinueWith(async t =>
            {
                try
                {
                    T[] batch = null;
                    lock (_locker)
                    {
                        if (_queue.Count > 0)
                        {
                            batch = _queue.ToArray(); _queue.Clear();
                        }
                    }
                    if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
                }
                finally
                {
                    if (t.IsFaulted)
                    {
                        _output.Fault(t.Exception.InnerException);
                    }
                    else
                    {
                        _output.Complete();
                    }
                }
            }, TaskScheduler.Default).Unwrap();
    
            this.Completion = Task.WhenAll(inputContinuation, _output.Completion);
        }
    
        public void Complete() => _input.Complete();
        void IDataflowBlock.Fault(Exception ex) => _input.Fault(ex);
    
        public int TriggerBatch(Func<T[], bool> condition, int nextMinBatchSizeIfEmpty)
        {
            if (nextMinBatchSizeIfEmpty < 1)
                throw new ArgumentOutOfRangeException(nameof(nextMinBatchSizeIfEmpty));
            int count = 0;
            lock (_locker)
            {
                if (_queue.Count > 0)
                {
                    T[] batch = _queue.ToArray();
                    if (condition == null || condition(batch))
                    {
                        bool accepted = _output.Post(batch);
                        if (accepted) { _queue.Clear(); count = batch.Length; }
                    }
                    _nextMinBatchSize = Int32.MaxValue;
                }
                else
                {
                    _nextMinBatchSize = nextMinBatchSizeIfEmpty;
                }
            }
            return count;
        }
    
        public int TriggerBatch(Func<T[], bool> condition)
            => TriggerBatch(condition, Int32.MaxValue);
    
        public int TriggerBatch(int nextMinBatchSizeIfEmpty)
            => TriggerBatch(null, nextMinBatchSizeIfEmpty);
    
        public int TriggerBatch() => TriggerBatch(null, Int32.MaxValue);
    
        DataflowMessageStatus ITargetBlock<T>.OfferMessage(
            DataflowMessageHeader messageHeader, T messageValue,
            ISourceBlock<T> source, bool consumeToAccept)
        {
            return _input.OfferMessage(messageHeader, messageValue, source,
                consumeToAccept);
        }
    
        T[] ISourceBlock<T[]>.ConsumeMessage(DataflowMessageHeader messageHeader,
            ITargetBlock<T[]> target, out bool messageConsumed)
        {
            return _output.ConsumeMessage(messageHeader, target, out messageConsumed);
        }
    
        bool ISourceBlock<T[]>.ReserveMessage(DataflowMessageHeader messageHeader,
            ITargetBlock<T[]> target)
        {
            return _output.ReserveMessage(messageHeader, target);
        }
    
        void ISourceBlock<T[]>.ReleaseReservation(DataflowMessageHeader messageHeader,
            ITargetBlock<T[]> target)
        {
            _output.ReleaseReservation(messageHeader, target);
        }
    
        IDisposable ISourceBlock<T[]>.LinkTo(ITargetBlock<T[]> target,
            DataflowLinkOptions linkOptions)
        {
            return _output.LinkTo(target, linkOptions);
        }
    }
    

    Another overload of the TriggerBatch method allows to examine the batch that can be currently produced, and decide if it should be triggered or not:

    public int TriggerBatch(Func<T[], bool> condition);
    

    The BatchBlockEx class does not support the Greedy and MaxNumberOfGroups options of the built-in BatchBlock.

    0 讨论(0)
提交回复
热议问题