I\'m currently working on a project, where we have the challenge to process items in parallel. So far not a big deal ;) Now to the problem. We have a list of IDs, where we perio
This is pretty similar to the approach you said you already had in your question, but does so with TPL tasks. A task just adds itself back to a list of things to schedule when its done.
The use of locking on a plain list is fairly ugly in this example, would probably want a better collection to hold the list of things to schedule
// Fill the idsToSchedule
for (int id = 0; id < 5; id++)
{
idsToSchedule.Add(Tuple.Create(DateTime.MinValue, id));
}
// LongRunning will tell TPL to create a new thread to run this on
Task.Factory.StartNew(SchedulingLoop, TaskCreationOptions.LongRunning);
That starts up the SchedulingLoop, which actually performs the checking if its been two seconds since something ran
// Tuple of the last time an id was processed and the id of the thing to schedule
static List<Tuple<DateTime, int>> idsToSchedule = new List<Tuple<DateTime, int>>();
static int currentlyProcessing = 0;
const int ProcessingLimit = 3;
// An event loop that performs the scheduling
public static void SchedulingLoop()
{
while (true)
{
lock (idsToSchedule)
{
DateTime currentTime = DateTime.Now;
for (int index = idsToSchedule.Count - 1; index >= 0; index--)
{
var scheduleItem = idsToSchedule[index];
var timeSincePreviousRun = (currentTime - scheduleItem.Item1).TotalSeconds;
// start it executing in a background task
if (timeSincePreviousRun > 2 && currentlyProcessing < ProcessingLimit)
{
Interlocked.Increment(ref currentlyProcessing);
Console.WriteLine("Scheduling {0} after {1} seconds", scheduleItem.Item2, timeSincePreviousRun);
// Schedule this task to be processed
Task.Factory.StartNew(() =>
{
Console.WriteLine("Executing {0}", scheduleItem.Item2);
// simulate the time taken to call this procedure
Thread.Sleep(new Random((int)DateTime.Now.Ticks).Next(0, 5000) + 500);
lock (idsToSchedule)
{
idsToSchedule.Add(Tuple.Create(DateTime.Now, scheduleItem.Item2));
}
Console.WriteLine("Done Executing {0}", scheduleItem.Item2);
Interlocked.Decrement(ref currentlyProcessing);
});
// remove this from the list of things to schedule
idsToSchedule.RemoveAt(index);
}
}
}
Thread.Sleep(100);
}
}
I don't think you actually need to get down and dirty with direct TPL Tasks for this. For starters I would set up a BlockingCollection around a ConcurrentQueue (the default) with no BoundedCapacity set on the BlockingCollection
to store the IDs that need to be processed.
// Setup the blocking collection somewhere when your process starts up (OnStart for a Windows service)
BlockingCollection<string> idsToProcess = new BlockingCollection<string>();
From there I would just use Parallel::ForEach on the enumeration returned from the BlockingCollection::GetConsumingEnumerable. In the ForEach
call you will setup your ParallelOptions::MaxDegreeOfParallelism Inside the body of the ForEach
you will execute your stored procedure.
Now, once the stored procedure execution completes, you're saying you don't want to re-schedule the execution for at least two seconds. No problem, schedule a System.Threading.Timer with a callback which will simply add the ID back to the BlockingCollection
in the supplied callback.
Parallel.ForEach(
idsToProcess.GetConsumingEnumerable(),
new ParallelOptions
{
MaxDegreeOfParallelism = 4 // read this from config
},
(id) =>
{
// ... execute sproc ...
// Need to declare/assign this before the delegate so that we can dispose of it inside
Timer timer = null;
timer = new Timer(
_ =>
{
// Add the id back to the collection so it will be processed again
idsToProcess.Add(id);
// Cleanup the timer
timer.Dispose();
},
null, // no state, id wee need is "captured" in the anonymous delegate
2000, // probably should read this from config
Timeout.Infinite);
}
Finally, when the process is shutting down you would call BlockingCollection::CompleteAdding so that the enumerable being processed with stop blocking and complete and the Parallel::ForEach will exit. If this were a Windows service for example you would do this in OnStop.
// When ready to shutdown you just signal you're done adding
idsToProcess.CompleteAdding();
Update
You raised a valid concern in your comment that you might be processing a large amount of IDs at any given point and fear that there would be too much overhead in a timer per ID. I would absolutely agree with that. So in the case that you are dealing with a large list of IDs concurrently, I would change from using a timer-per-ID to using another queue to hold the "sleeping" IDs which is monitored by a single short interval timer instead. First you'll need a ConcurrentQueue
onto which to place the IDs that are asleep:
ConcurrentQueue<Tuple<string, DateTime>> sleepingIds = new ConcurrentQueue<Tuple<string, DateTime>>();
Now, I'm using a two-part Tuple here for illustration purposes, but you may want to create a more strongly typed struct for it (or at least alias it with a using
statement) for better readability. The tuple has the id and a DateTime which represents when it was put on the queue.
Now you'll also want to setup the timer that will monitor this queue:
Timer wakeSleepingIdsTimer = new Timer(
_ =>
{
DateTime utcNow = DateTime.UtcNow;
// Pull all items from the sleeping queue that have been there for at least 2 seconds
foreach(string id in sleepingIds.TakeWhile(entry => (utcNow - entry.Item2).TotalSeconds >= 2))
{
// Add this id back to the processing queue
idsToProcess.Enqueue(id);
}
},
null, // no state
Timeout.Infinite, // no due time
100 // wake up every 100ms, probably should read this from config
);
Then you would simply change the Parallel::ForEach
to do the following instead of setting up a timer for each one:
(id) =>
{
// ... execute sproc ...
sleepingIds.Enqueue(Tuple.Create(id, DateTime.UtcNow));
}