TPL architectural question

后端未结

关注

 2  1056

I\'m currently working on a project, where we have the challenge to process items in parallel. So far not a big deal ;) Now to the problem. We have a list of IDs, where we perio

相关标签:

2条回答

醉话见心

2021-02-06 10:27

This is pretty similar to the approach you said you already had in your question, but does so with TPL tasks. A task just adds itself back to a list of things to schedule when its done.

The use of locking on a plain list is fairly ugly in this example, would probably want a better collection to hold the list of things to schedule

// Fill the idsToSchedule
for (int id = 0; id < 5; id++)
{
    idsToSchedule.Add(Tuple.Create(DateTime.MinValue, id));
}

// LongRunning will tell TPL to create a new thread to run this on
Task.Factory.StartNew(SchedulingLoop, TaskCreationOptions.LongRunning);

That starts up the SchedulingLoop, which actually performs the checking if its been two seconds since something ran

// Tuple of the last time an id was processed and the id of the thing to schedule
static List<Tuple<DateTime, int>> idsToSchedule = new List<Tuple<DateTime, int>>();
static int currentlyProcessing = 0;
const int ProcessingLimit = 3;

// An event loop that performs the scheduling
public static void SchedulingLoop()
{
    while (true)
    {
        lock (idsToSchedule)
        {
            DateTime currentTime = DateTime.Now;
            for (int index = idsToSchedule.Count - 1; index >= 0; index--)
            {
                var scheduleItem = idsToSchedule[index];
                var timeSincePreviousRun = (currentTime - scheduleItem.Item1).TotalSeconds;

                // start it executing in a background task
                if (timeSincePreviousRun > 2 && currentlyProcessing < ProcessingLimit)
                {
                    Interlocked.Increment(ref currentlyProcessing);

                    Console.WriteLine("Scheduling {0} after {1} seconds", scheduleItem.Item2, timeSincePreviousRun);

                    // Schedule this task to be processed
                    Task.Factory.StartNew(() =>
                        {
                            Console.WriteLine("Executing {0}", scheduleItem.Item2);

                            // simulate the time taken to call this procedure
                            Thread.Sleep(new Random((int)DateTime.Now.Ticks).Next(0, 5000) + 500);

                            lock (idsToSchedule)
                            {
                                idsToSchedule.Add(Tuple.Create(DateTime.Now, scheduleItem.Item2));
                            }

                            Console.WriteLine("Done Executing {0}", scheduleItem.Item2);
                            Interlocked.Decrement(ref currentlyProcessing);
                        });

                    // remove this from the list of things to schedule
                    idsToSchedule.RemoveAt(index);
                }
            }
        }

        Thread.Sleep(100);
    }
}

0 讨论(0)

梦谈多话

2021-02-06 10:45
I don't think you actually need to get down and dirty with direct TPL Tasks for this. For starters I would set up a BlockingCollection around a ConcurrentQueue (the default) with no BoundedCapacity set on the BlockingCollection to store the IDs that need to be processed.
```
// Setup the blocking collection somewhere when your process starts up (OnStart for a Windows service)
BlockingCollection<string> idsToProcess = new BlockingCollection<string>();
```
From there I would just use Parallel::ForEach on the enumeration returned from the BlockingCollection::GetConsumingEnumerable. In the ForEach call you will setup your ParallelOptions::MaxDegreeOfParallelism Inside the body of the ForEach you will execute your stored procedure.

Now, once the stored procedure execution completes, you're saying you don't want to re-schedule the execution for at least two seconds. No problem, schedule a System.Threading.Timer with a callback which will simply add the ID back to the BlockingCollection in the supplied callback.
```
Parallel.ForEach(
    idsToProcess.GetConsumingEnumerable(),
    new ParallelOptions 
    { 
        MaxDegreeOfParallelism = 4 // read this from config
    },
    (id) =>
    {
       // ... execute sproc ...

       // Need to declare/assign this before the delegate so that we can dispose of it inside 
       Timer timer = null;

       timer = new Timer(
           _ =>
           {
               // Add the id back to the collection so it will be processed again
               idsToProcess.Add(id);

               // Cleanup the timer
               timer.Dispose();
           },
           null, // no state, id wee need is "captured" in the anonymous delegate
           2000, // probably should read this from config
           Timeout.Infinite);
    }
```
Finally, when the process is shutting down you would call BlockingCollection::CompleteAdding so that the enumerable being processed with stop blocking and complete and the Parallel::ForEach will exit. If this were a Windows service for example you would do this in OnStop.
```
// When ready to shutdown you just signal you're done adding
idsToProcess.CompleteAdding();
```
Update

You raised a valid concern in your comment that you might be processing a large amount of IDs at any given point and fear that there would be too much overhead in a timer per ID. I would absolutely agree with that. So in the case that you are dealing with a large list of IDs concurrently, I would change from using a timer-per-ID to using another queue to hold the "sleeping" IDs which is monitored by a single short interval timer instead. First you'll need a ConcurrentQueue onto which to place the IDs that are asleep:
```
ConcurrentQueue<Tuple<string, DateTime>> sleepingIds = new ConcurrentQueue<Tuple<string, DateTime>>();
```
Now, I'm using a two-part Tuple here for illustration purposes, but you may want to create a more strongly typed struct for it (or at least alias it with a using statement) for better readability. The tuple has the id and a DateTime which represents when it was put on the queue.

Now you'll also want to setup the timer that will monitor this queue:
```
Timer wakeSleepingIdsTimer = new Timer(
   _ =>
   {
       DateTime utcNow = DateTime.UtcNow;

       // Pull all items from the sleeping queue that have been there for at least 2 seconds
       foreach(string id in sleepingIds.TakeWhile(entry => (utcNow - entry.Item2).TotalSeconds >= 2))
       {
           // Add this id back to the processing queue
           idsToProcess.Enqueue(id);
       }
   },
   null, // no state
   Timeout.Infinite, // no due time
   100 // wake up every 100ms, probably should read this from config
 );
```
Then you would simply change the Parallel::ForEach to do the following instead of setting up a timer for each one:
```
(id) =>
{
       // ... execute sproc ...

       sleepingIds.Enqueue(Tuple.Create(id, DateTime.UtcNow)); 
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...