TPL architectural question

后端 未结 2 1056
甜味超标
甜味超标 2021-02-06 09:48

I\'m currently working on a project, where we have the challenge to process items in parallel. So far not a big deal ;) Now to the problem. We have a list of IDs, where we perio

相关标签:
2条回答
  • 2021-02-06 10:27

    This is pretty similar to the approach you said you already had in your question, but does so with TPL tasks. A task just adds itself back to a list of things to schedule when its done.

    The use of locking on a plain list is fairly ugly in this example, would probably want a better collection to hold the list of things to schedule

    // Fill the idsToSchedule
    for (int id = 0; id < 5; id++)
    {
        idsToSchedule.Add(Tuple.Create(DateTime.MinValue, id));
    }
    
    // LongRunning will tell TPL to create a new thread to run this on
    Task.Factory.StartNew(SchedulingLoop, TaskCreationOptions.LongRunning);
    

    That starts up the SchedulingLoop, which actually performs the checking if its been two seconds since something ran

    // Tuple of the last time an id was processed and the id of the thing to schedule
    static List<Tuple<DateTime, int>> idsToSchedule = new List<Tuple<DateTime, int>>();
    static int currentlyProcessing = 0;
    const int ProcessingLimit = 3;
    
    // An event loop that performs the scheduling
    public static void SchedulingLoop()
    {
        while (true)
        {
            lock (idsToSchedule)
            {
                DateTime currentTime = DateTime.Now;
                for (int index = idsToSchedule.Count - 1; index >= 0; index--)
                {
                    var scheduleItem = idsToSchedule[index];
                    var timeSincePreviousRun = (currentTime - scheduleItem.Item1).TotalSeconds;
    
                    // start it executing in a background task
                    if (timeSincePreviousRun > 2 && currentlyProcessing < ProcessingLimit)
                    {
                        Interlocked.Increment(ref currentlyProcessing);
    
                        Console.WriteLine("Scheduling {0} after {1} seconds", scheduleItem.Item2, timeSincePreviousRun);
    
                        // Schedule this task to be processed
                        Task.Factory.StartNew(() =>
                            {
                                Console.WriteLine("Executing {0}", scheduleItem.Item2);
    
                                // simulate the time taken to call this procedure
                                Thread.Sleep(new Random((int)DateTime.Now.Ticks).Next(0, 5000) + 500);
    
                                lock (idsToSchedule)
                                {
                                    idsToSchedule.Add(Tuple.Create(DateTime.Now, scheduleItem.Item2));
                                }
    
                                Console.WriteLine("Done Executing {0}", scheduleItem.Item2);
                                Interlocked.Decrement(ref currentlyProcessing);
                            });
    
                        // remove this from the list of things to schedule
                        idsToSchedule.RemoveAt(index);
                    }
                }
            }
    
            Thread.Sleep(100);
        }
    }
    
    0 讨论(0)
  • 2021-02-06 10:45

    I don't think you actually need to get down and dirty with direct TPL Tasks for this. For starters I would set up a BlockingCollection around a ConcurrentQueue (the default) with no BoundedCapacity set on the BlockingCollection to store the IDs that need to be processed.

    // Setup the blocking collection somewhere when your process starts up (OnStart for a Windows service)
    BlockingCollection<string> idsToProcess = new BlockingCollection<string>();
    

    From there I would just use Parallel::ForEach on the enumeration returned from the BlockingCollection::GetConsumingEnumerable. In the ForEach call you will setup your ParallelOptions::MaxDegreeOfParallelism Inside the body of the ForEach you will execute your stored procedure.

    Now, once the stored procedure execution completes, you're saying you don't want to re-schedule the execution for at least two seconds. No problem, schedule a System.Threading.Timer with a callback which will simply add the ID back to the BlockingCollection in the supplied callback.

    Parallel.ForEach(
        idsToProcess.GetConsumingEnumerable(),
        new ParallelOptions 
        { 
            MaxDegreeOfParallelism = 4 // read this from config
        },
        (id) =>
        {
           // ... execute sproc ...
    
           // Need to declare/assign this before the delegate so that we can dispose of it inside 
           Timer timer = null;
    
           timer = new Timer(
               _ =>
               {
                   // Add the id back to the collection so it will be processed again
                   idsToProcess.Add(id);
    
                   // Cleanup the timer
                   timer.Dispose();
               },
               null, // no state, id wee need is "captured" in the anonymous delegate
               2000, // probably should read this from config
               Timeout.Infinite);
        }
    

    Finally, when the process is shutting down you would call BlockingCollection::CompleteAdding so that the enumerable being processed with stop blocking and complete and the Parallel::ForEach will exit. If this were a Windows service for example you would do this in OnStop.

    // When ready to shutdown you just signal you're done adding
    idsToProcess.CompleteAdding();
    

    Update

    You raised a valid concern in your comment that you might be processing a large amount of IDs at any given point and fear that there would be too much overhead in a timer per ID. I would absolutely agree with that. So in the case that you are dealing with a large list of IDs concurrently, I would change from using a timer-per-ID to using another queue to hold the "sleeping" IDs which is monitored by a single short interval timer instead. First you'll need a ConcurrentQueue onto which to place the IDs that are asleep:

    ConcurrentQueue<Tuple<string, DateTime>> sleepingIds = new ConcurrentQueue<Tuple<string, DateTime>>();
    

    Now, I'm using a two-part Tuple here for illustration purposes, but you may want to create a more strongly typed struct for it (or at least alias it with a using statement) for better readability. The tuple has the id and a DateTime which represents when it was put on the queue.

    Now you'll also want to setup the timer that will monitor this queue:

    Timer wakeSleepingIdsTimer = new Timer(
       _ =>
       {
           DateTime utcNow = DateTime.UtcNow;
    
           // Pull all items from the sleeping queue that have been there for at least 2 seconds
           foreach(string id in sleepingIds.TakeWhile(entry => (utcNow - entry.Item2).TotalSeconds >= 2))
           {
               // Add this id back to the processing queue
               idsToProcess.Enqueue(id);
           }
       },
       null, // no state
       Timeout.Infinite, // no due time
       100 // wake up every 100ms, probably should read this from config
     );
    

    Then you would simply change the Parallel::ForEach to do the following instead of setting up a timer for each one:

    (id) =>
    {
           // ... execute sproc ...
    
           sleepingIds.Enqueue(Tuple.Create(id, DateTime.UtcNow)); 
    }
    
    0 讨论(0)
提交回复
热议问题