I humbly submit my own micro-benchmarking mini-library (on Github). It's super simple -- the only advantage it has over rolling your own is that it already has the high-performance timer code implemented for Windows and Linux, and abstracts away the annoying boilerplate.
Just pass in a function (or lambda), the number of times it should be called per test run (default: 1), and the number of test runs (default: 100). The fastest test run (measured in fractional milliseconds) is returned:
// Example that times the compare-and-swap atomic operation from C++11
// Sample GCC command: g++ -std=c++11 -DNDEBUG -O3 -lrt main.cpp microbench/systemtime.cpp -o bench
#include "microbench/microbench.h"
#include
#include
int main()
{
std::atomic x(0);
int y = 0;
printf("CAS takes %.4fms to execute 100000 iterations\n",
moodycamel::microbench(
[&]() { x.compare_exchange_strong(y, 0); }, /* function to benchmark */
100000, /* iterations per test run */
100 /* test runs */
)
);
// Result: Clocks in at 1.2ms (12ns per CAS operation) in my environment
return 0;
}