I\'m currently developing a very fast algorithm, with one part of it being an extremely fast scanner and statistics function. In this quest, i\'m after any performance benefit.
It's hard to beat static allocation for speed, and while the 10% is a pretty small difference, it could be due to address calculation.
But if you're looking for speed,
your example in a comment while(p<end)stats[*p++]++;
is an obvious candidate for unrolling, such as:
static int stats[M];
static int index_array[N];
int *p = index_array, *pend = p+N;
// ... initialize the arrays ...
while (p < pend-8){
stats[p[0]]++;
stats[p[1]]++;
stats[p[2]]++;
stats[p[3]]++;
stats[p[4]]++;
stats[p[5]]++;
stats[p[6]]++;
stats[p[7]]++;
p += 8;
}
while(p<pend) stats[*p++]++;
Don't count on the compiler to do it for you. It might or might not be able to figure it out.
Other possible optimizations come to mind, but they depend on what you're actually trying to do.
If you have something like
int stats[256]; while (p<end) stats[*p++]++;
static int stats[256]; while (p<end) stats[*p++]++;
you are not really comparing the same thing because for the first instance you are not doing an initialization of your array. Written explicitly the second line is equivalent to
static int stats[256] = { 0 }; while (p<end) stats[*p++]++;
So to be a fair comparison you should have the first read
int stats[256] = { 0 }; while (p<end) stats[*p++]++;
Your compiler might deduce much more things if he has the variables in a known state.
Now then, there could be runtime advantage of the static
case, since the initialization is done at compile time (or program startup).
To test if this makes up for your difference you should run the same function with the static declaration and the loop several times, to see if the difference vanishes if your number of invocations grows.
But as other said already, best is to inspect the assembler that your compiler produces to see what effective difference there are in the code that is produced.
Creating local variables can be literally free if they are POD types. You likely are overflowing a cache line with too many stack variables or other similar alignment-based causes which are very specific to your piece of code. I usually find that non-local variables significantly decrease performance.
It really depends on your compiler, platform, and other details. However, I can describe one scenario where global variables are faster.
In many cases, a global variable is at a fixed offset. This allows the generated instructions to simply use that address directly. (Something along the lines of MOV AX,[MyVar]
.)
However, if you have a variable that's relative to the current stack pointer or a member of a class or array, some math is required to take the address of the array and determine the address of the actual variable.
Obviously, if you need to place some sort of mutex on your global variable in order to keep it thread-safe, then you'll almost certainly more than lose any performance gain.