Throttling asynchronous tasks

后端 未结 3 435
心在旅途
心在旅途 2020-11-22 13:13

I would like to run a bunch of async tasks, with a limit on how many tasks may be pending completion at any given time.

Say you have 1000 URLs, and you only want to

3条回答
  •  感情败类
    2020-11-22 13:28

    As requested, here's the code I ended up going with.

    The work is set up in a master-detail configuration, and each master is processed as a batch. Each unit of work is queued up in this fashion:

    var success = true;
    
    // Start processing all the master records.
    Master master;
    while (null != (master = await StoredProcedures.ClaimRecordsAsync(...)))
    {
        await masterBuffer.SendAsync(master);
    }
    
    // Finished sending master records
    masterBuffer.Complete();
    
    // Now, wait for all the batches to complete.
    await batchAction.Completion;
    
    return success;
    

    Masters are buffered one at a time to save work for other outside processes. The details for each master are dispatched for work via the masterTransform TransformManyBlock. A BatchedJoinBlock is also created to collect the details in one batch.

    The actual work is done in the detailTransform TransformBlock, asynchronously, 150 at a time. BoundedCapacity is set to 300 to ensure that too many Masters don't get buffered at the beginning of the chain, while also leaving room for enough detail records to be queued to allow 150 records to be processed at one time. The block outputs an object to its targets, because it's filtered across the links depending on whether it's a Detail or Exception.

    The batchAction ActionBlock collects the output from all the batches, and performs bulk database updates, error logging, etc. for each batch.

    There will be several BatchedJoinBlocks, one for each master. Since each ISourceBlock is output sequentially and each batch only accepts the number of detail records associated with one master, the batches will be processed in order. Each block only outputs one group, and is unlinked on completion. Only the last batch block propagates its completion to the final ActionBlock.

    The dataflow network:

    // The dataflow network
    BufferBlock masterBuffer = null;
    TransformManyBlock masterTransform = null;
    TransformBlock detailTransform = null;
    ActionBlock, IList>> batchAction = null;
    
    // Buffer master records to enable efficient throttling.
    masterBuffer = new BufferBlock(new DataflowBlockOptions { BoundedCapacity = 1 });
    
    // Sequentially transform master records into a stream of detail records.
    masterTransform = new TransformManyBlock(async masterRecord =>
    {
        var records = await StoredProcedures.GetObjectsAsync(masterRecord);
    
        // Filter the master records based on some criteria here
        var filteredRecords = records;
    
        // Only propagate completion to the last batch
        var propagateCompletion = masterBuffer.Completion.IsCompleted && masterTransform.InputCount == 0;
    
        // Create a batch join block to encapsulate the results of the master record.
        var batchjoinblock = new BatchedJoinBlock(records.Count(), new GroupingDataflowBlockOptions { MaxNumberOfGroups = 1 });
    
        // Add the batch block to the detail transform pipeline's link queue, and link the batch block to the the batch action block.
        var detailLink1 = detailTransform.LinkTo(batchjoinblock.Target1, detailResult => detailResult is Detail);
        var detailLink2 = detailTransform.LinkTo(batchjoinblock.Target2, detailResult => detailResult is Exception);
        var batchLink = batchjoinblock.LinkTo(batchAction, new DataflowLinkOptions { PropagateCompletion = propagateCompletion });
    
        // Unlink batchjoinblock upon completion.
        // (the returned task does not need to be awaited, despite the warning.)
        batchjoinblock.Completion.ContinueWith(task =>
        {
            detailLink1.Dispose();
            detailLink2.Dispose();
            batchLink.Dispose();
        });
    
        return filteredRecords;
    }, new ExecutionDataflowBlockOptions { BoundedCapacity = 1 });
    
    // Process each detail record asynchronously, 150 at a time.
    detailTransform = new TransformBlock(async detail => {
        try
        {
            // Perform the action for each detail here asynchronously
            await DoSomethingAsync();
    
            return detail;
        }
        catch (Exception e)
        {
            success = false;
            return e;
        }
    
    }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 150, BoundedCapacity = 300 });
    
    // Perform the proper action for each batch
    batchAction = new ActionBlock, IList>>(async batch =>
    {
        var details = batch.Item1.Cast();
        var errors = batch.Item2.Cast();
    
        // Do something with the batch here
    }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
    
    masterBuffer.LinkTo(masterTransform, new DataflowLinkOptions { PropagateCompletion = true });
    masterTransform.LinkTo(detailTransform, new DataflowLinkOptions { PropagateCompletion = true });
    
        

    提交回复
    热议问题