Sorry, it\'s a long one, but I\'m just explaining my train of thought as I analyze this. Questions at the end.
I have an understanding of what goes into measuring runni
My first thought is that a loop as simple as
for (int i = 0; i < x; i++)
{
timer.Start();
test();
timer.Stop();
}
is kinda silly compared to:
timer.Start();
for (int i = 0; i < x; i++)
test();
timer.Stop();
the reason is that (1) this kind of "for" loop has a very tiny overhead, so small that it's almost not worth worrying about even if test() only takes a microsecond, and (2) timer.Start() and timer.Stop() have their own overhead, which is likely to affect the results more than the for loop. That said, I took a peek at Stopwatch in Reflector and noticed that Start() and Stop() are fairly cheap (calling Elapsed* properties is likely more expensive, considering the math involved.)
Make sure the IsHighResolution property of Stopwatch is true. If it's false, Stopwatch uses DateTime.UtcNow, which I believe is only updated every 15-16 ms.
1. Is getting the running time of each individual iteration generally a good thing to have?
It is not usually necessary to measure the runtime of each individual iteration, but it is useful to find out how much the performance varies between different iterations. To this end, you can compute the min/max (or k outliers) and standard deviation. Only the "median" statistic requires you to record every iteration.
If you find that the standard deviation is large, you might then have reason to reason to record every iteration, in order to explore why the time keeps changing.
Some people have written small frameworks to help you do performance benchmarks. For example, CodeTimers. If you are testing something that is so tiny and simple that the overhead of the benchmark library matters, consider running the operation in a for-loop inside the lambda that the benchmark library calls. If the operation is so tiny that the overhead of a for-loop matters (e.g. measuring the speed of multiplication), then use manual loop unrolling. But if you use loop unrolling, remember that most real-world apps don't use manual loop unrolling, so your benchmark results may overstate the real-world performance.
For myself I wrote a little class for gathering min, max, mean, and standard deviation, which could be used for benchmarks or other statistics:
// A lightweight class to help you compute the minimum, maximum, average
// and standard deviation of a set of values. Call Clear(), then Add(each
// value); you can compute the average and standard deviation at any time by
// calling Avg() and StdDeviation().
class Statistic
{
public double Min;
public double Max;
public double Count;
public double SumTotal;
public double SumOfSquares;
public void Clear()
{
SumOfSquares = Min = Max = Count = SumTotal = 0;
}
public void Add(double nextValue)
{
Debug.Assert(!double.IsNaN(nextValue));
if (Count > 0)
{
if (Min > nextValue)
Min = nextValue;
if (Max < nextValue)
Max = nextValue;
SumTotal += nextValue;
SumOfSquares += nextValue * nextValue;
Count++;
}
else
{
Min = Max = SumTotal = nextValue;
SumOfSquares = nextValue * nextValue;
Count = 1;
}
}
public double Avg()
{
return SumTotal / Count;
}
public double Variance()
{
return (SumOfSquares * Count - SumTotal * SumTotal) / (Count * (Count - 1));
}
public double StdDeviation()
{
return Math.Sqrt(Variance());
}
public Statistic Clone()
{
return (Statistic)MemberwiseClone();
}
};
2. Is having a small loop of runs before the actual timing starts good too?
Which iterations you measure depends on whether you care most about startup time, steady-state time or total runtime. In general, it may be useful to record one or more runs separately as "startup" runs. You can expect the first iteration (and sometimes more than one) to run more slowly. As an extreme example, my GoInterfaces library consistently takes about 140 milliseconds to produce its first output, then it does 9 more in about 15 ms.
Depending on what the benchmark measures, you may find that if you run the benchmark right after rebooting, the first iteration (or first few iterations) will run very slowly. Then, if you run the benchmark a second time, the first iteration will be faster.
3. Would a forced Thread.Yield() within the loop help or hurt the timings of CPU bound test cases?
I'm not sure. It may clear the processor caches (L1, L2, TLB), which would not only slow down your benchmark overall but lower the measured speeds. Your results will be more "artificial", not reflecting as well what you would get in the real world. Perhaps a better approach is to avoid running other tasks at the same time as your benchmark.