In my application I execute from couple of dozens to couple of hundreds actions in parallel (no return value for the actions).
Which approach would be the most optimal:<
I have used the tests from StriplingWarror to find out where the difference does come from. I did this because when i do look with Reflector at the code the class Parallel does nothing different than creating a bunch of tasks and let them run.
From a theoretical point of view both approaches should be equivalent in terms of run time. But as the (not very realistic) tests with an empty action did show that the Parallel class is much faster.
The task version does spend nearly all its time with creating new tasks which does lead to many garbage collections. The speed difference you see is purely due to the fact that you create many tasks which quickly become garbage.
The Parallel class instead does create its own task derived class which does run concurrently on all CPUs. There is only one phyiscal task running at all cores. The synchronization does happen inside the task delegate now which does explain the much faster speed of the Parallel class.
ParallelForReplicatingTask task2 = new ParallelForReplicatingTask(parallelOptions, delegate {
for (int k = Interlocked.Increment(ref actionIndex); k <= actionsCopy.Length; k = Interlocked.Increment(ref actionIndex))
{
actionsCopy[k - 1]();
}
}, TaskCreationOptions.None, InternalTaskOptions.SelfReplicating);
task2.RunSynchronously(parallelOptions.EffectiveTaskScheduler);
task2.Wait();
So what is better then? The best task is the task which is never run. If you need to create so many tasks that they become a burden to the garbage collector you should stay away from the task APIs and stick the the Parallel class which gives you direct parallel execution at all cores without new tasks.
If you need to become even faster it might be that creating threads by hand and use hand optimized data structures to give you maximum speed for your access pattern is the most performant solution. But it is unlikely that you will succeed in doing so because the TPL and Parallel APIs are already heavily tuned. Usually you need to use one of the many overloads to configure your running tasks or Parallel class to achieve the same with much less code.
But if you have a non standard threading pattern it might be that you are better off without using TPL to get most out of your cores. Even Stephen Toub did mention that the TPL APIs were not designed for ultra fast performance but the main goal was to make threading easier for the "average" programmer. To beat the TPL in specific cases you need to be well above average and you need to know a lot of stuff about CPU cache lines, thread scheduling, memory models, JIT code generation, ... to come up in your specific scenario with something better.
The most important difference between these two is that Parallel.Invoke
will wait for all the actions to complete before continuing with the code, whereas StartNew
will move on to the next line of code, allowing the tasks to complete in their own good time.
This semantic difference should be your first (and probably only) consideration. But for informational purposes, here's a benchmark:
/* This is a benchmarking template I use in LINQPad when I want to do a
* quick performance test. Just give it a couple of actions to test and
* it will give you a pretty good idea of how long they take compared
* to one another. It's not perfect: You can expect a 3% error margin
* under ideal circumstances. But if you're not going to improve
* performance by more than 3%, you probably don't care anyway.*/
void Main()
{
// Enter setup code here
var actions2 =
(from i in Enumerable.Range(1, 10000)
select (Action)(() => {})).ToArray();
var awaitList = new Task[actions2.Length];
var actions = new[]
{
new TimedAction("Task.Factory.StartNew", () =>
{
// Enter code to test here
int j = 0;
foreach(var action in actions2)
{
awaitList[j++] = Task.Factory.StartNew(action);
}
Task.WaitAll(awaitList);
}),
new TimedAction("Parallel.Invoke", () =>
{
// Enter code to test here
Parallel.Invoke(actions2);
}),
};
const int TimesToRun = 100; // Tweak this as necessary
TimeActions(TimesToRun, actions);
}
#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
Stopwatch s = new Stopwatch();
int length = actions.Length;
var results = new ActionResult[actions.Length];
// Perform the actions in their initial order.
for(int i = 0; i < length; i++)
{
var action = actions[i];
var result = results[i] = new ActionResult{Message = action.Message};
// Do a dry run to get things ramped up/cached
result.DryRun1 = s.Time(action.Action, 10);
result.FullRun1 = s.Time(action.Action, iterations);
}
// Perform the actions in reverse order.
for(int i = length - 1; i >= 0; i--)
{
var action = actions[i];
var result = results[i];
// Do a dry run to get things ramped up/cached
result.DryRun2 = s.Time(action.Action, 10);
result.FullRun2 = s.Time(action.Action, iterations);
}
results.Dump();
}
public class ActionResult
{
public string Message {get;set;}
public double DryRun1 {get;set;}
public double DryRun2 {get;set;}
public double FullRun1 {get;set;}
public double FullRun2 {get;set;}
}
public class TimedAction
{
public TimedAction(string message, Action action)
{
Message = message;
Action = action;
}
public string Message {get;private set;}
public Action Action {get;private set;}
}
public static class StopwatchExtensions
{
public static double Time(this Stopwatch sw, Action action, int iterations)
{
sw.Restart();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
return sw.Elapsed.TotalMilliseconds;
}
}
#endregion
Results:
Message | DryRun1 | DryRun2 | FullRun1 | FullRun2
----------------------------------------------------------------
Task.Factory.StartNew | 43.0592 | 50.847 | 452.2637 | 463.2310
Parallel.Invoke | 10.5717 | 9.948 | 102.7767 | 101.1158
As you can see, using Parallel.Invoke can be roughly 4.5x faster than waiting for a bunch of newed-up tasks to complete. Of course, that's when your actions do absolutely nothing. The more each action does, the less of a difference you'll notice.
In the grand scheme of things the performance differences between the two methods is negligible when considering the overhead of actually dealing with lots of tasks in any case.
The Parallel.Invoke
basically performs the Task.Factory.StartNew()
for you. So, I'd say readability is more important here.
Also, as StriplingWarrior mentions, the Parallel.Invoke
performs a WaitAll
(blocking the code until all the tasks are completed) for you, so you don't have to do that either. If you want to have the tasks run in the background without caring when they complete, then you want Task.Factory.StartNew()
.