When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? Allow me to outline why this question has co
These will apply to "plain" C (not C++).
First let's clear some terminology
"static" is a keyword in C which will drastically change the way your variable is allocated / accessed if it is applied on variables declared within functions.
There are 3 places (regarding C) where a variable (including arrays) may sit:
static
.static
or not, there the keyword relates to visibility), and any function local variables declared static
.malloc()
& free()
) referred by a pointer. You access this data only through pointers.Now let's see how one dimensional arrays are accessed
If you access an array with a constant index (may be #define
d, but not const
in plain C), this index can be calculated by the compiler. If you have a true array in the Data section, it will be accessed without any indirection. If you have a pointer (Heap) or an array on the Stack, an indirection is always necessary. So arrays in the Data section with this type of access may be a very little bit faster. But this is not a very useful thing which would turn the world.
If you access an array with an index variable, it essentially always decays to a pointer since the index may change (for example increment in a for loop). The generated code will likely be very similar or even identical for all types here.
Bring in more dimensions
If you declare a two or more dimensional array, and access it partially or fully by constants, an intelligent compiler may well optimize these constants out as above.
If you access by indices, note that the memory is linear. If the later dimensions of a true array are not a multiple of 2, the compiler will need to generate multiplications. For example in the array int arr[4][12];
the second dimension is 12. If you now access it as arr[i][j]
where i
and j
are index variables, the linear memory has to be indexed as 12 * i + j
. So the compiler has to generate code to multiply with a constant here. The complexity depends on how "far" the constant is from a power of 2. Here the resulting code will likely look somewhat like calculating (i<<3) + (i<<2) + j
to access the element in the array.
If you build up the two dimensional "array" from pointers, the size of the dimensions do not matter since there are reference pointers in your structure. Here if you can write arr[i][j]
, that implies you declared it as for example int* arr[4]
, and then malloc()
ed four chunks of memory of 12 int
s each into it. Note that your four pointers (which the compiler now can use as base) also consume memory which wasn't taken if it was a true array. Also note that here the generated code will contain a double indirection: First the code loads a pointer by i
from arr
, then it will load an int
from that pointer by j
.
If the lengths are "far" from powers of 2 (so complex "multiply with constant" codes would have to be generated to access the elements) then using pointers may generate faster access codes.
As James Kanze mentioned in his answer, in some circumstances the compiler may be able to optimize access for true multi-dimensional arrays. This kind of optimization is impossible for arrays composed from pointers as the "array" is actually not a linear chunk of memory that case.
Locality matters
If you are developing for usual desktop / mobile architectures (Intel / ARM 32 / 64 bit processors) locality also matters. That is what is likely sitting in the cache. If your variables are already in the cache for some reason, they will be accessed faster.
In the term of locality Stack is always the winner since the Stack is so frequently used that it is very likely to always sit in the cache. So small arrays are best put in there.
Using true multi-dimensional arrays instead of composing one from pointers may also help on this ground since a true array is always a linear chunk of memory, so it usually might need fewer blocks of cache to load in. A scattered pointer composition (that is if using separately malloc()
ed chunks) to the contrary might need more cache blocks, and may rise cache line conflicts depending on how the chunks physically ended up on the heap.