When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? Allow me to outline why this question has co
Often there is a trade off between memory consumption and speed. Empirically, I have witnessed that creating array on stack is faster than allocation on heap. As the array size increases this becomes more apparent.
You can always decrease the memory consumption. For example you can use short or char instead of int etc.
As the array size increases, especially with the use of realloc, there might be a lot more page replacement (up and down) to maintain the contiguous location of items.
You should also consider that there is a lower limit for the size of the things you can store in stack, for heap this limit is higher but as I told with the cost of performance.
Stalk memory allocation offers quicker access of data than the Heap. The CPU would look for the address in the cache if it does not have it, if it does not find the address in the cache then it would look up in the main memory. Stalk is a preferred location after cache.
These will apply to "plain" C (not C++).
First let's clear some terminology
"static" is a keyword in C which will drastically change the way your variable is allocated / accessed if it is applied on variables declared within functions.
There are 3 places (regarding C) where a variable (including arrays) may sit:
static
.static
or not, there the keyword relates to visibility), and any function local variables declared static
.malloc()
& free()
) referred by a pointer. You access this data only through pointers.Now let's see how one dimensional arrays are accessed
If you access an array with a constant index (may be #define
d, but not const
in plain C), this index can be calculated by the compiler. If you have a true array in the Data section, it will be accessed without any indirection. If you have a pointer (Heap) or an array on the Stack, an indirection is always necessary. So arrays in the Data section with this type of access may be a very little bit faster. But this is not a very useful thing which would turn the world.
If you access an array with an index variable, it essentially always decays to a pointer since the index may change (for example increment in a for loop). The generated code will likely be very similar or even identical for all types here.
Bring in more dimensions
If you declare a two or more dimensional array, and access it partially or fully by constants, an intelligent compiler may well optimize these constants out as above.
If you access by indices, note that the memory is linear. If the later dimensions of a true array are not a multiple of 2, the compiler will need to generate multiplications. For example in the array int arr[4][12];
the second dimension is 12. If you now access it as arr[i][j]
where i
and j
are index variables, the linear memory has to be indexed as 12 * i + j
. So the compiler has to generate code to multiply with a constant here. The complexity depends on how "far" the constant is from a power of 2. Here the resulting code will likely look somewhat like calculating (i<<3) + (i<<2) + j
to access the element in the array.
If you build up the two dimensional "array" from pointers, the size of the dimensions do not matter since there are reference pointers in your structure. Here if you can write arr[i][j]
, that implies you declared it as for example int* arr[4]
, and then malloc()
ed four chunks of memory of 12 int
s each into it. Note that your four pointers (which the compiler now can use as base) also consume memory which wasn't taken if it was a true array. Also note that here the generated code will contain a double indirection: First the code loads a pointer by i
from arr
, then it will load an int
from that pointer by j
.
If the lengths are "far" from powers of 2 (so complex "multiply with constant" codes would have to be generated to access the elements) then using pointers may generate faster access codes.
As James Kanze mentioned in his answer, in some circumstances the compiler may be able to optimize access for true multi-dimensional arrays. This kind of optimization is impossible for arrays composed from pointers as the "array" is actually not a linear chunk of memory that case.
Locality matters
If you are developing for usual desktop / mobile architectures (Intel / ARM 32 / 64 bit processors) locality also matters. That is what is likely sitting in the cache. If your variables are already in the cache for some reason, they will be accessed faster.
In the term of locality Stack is always the winner since the Stack is so frequently used that it is very likely to always sit in the cache. So small arrays are best put in there.
Using true multi-dimensional arrays instead of composing one from pointers may also help on this ground since a true array is always a linear chunk of memory, so it usually might need fewer blocks of cache to load in. A scattered pointer composition (that is if using separately malloc()
ed chunks) to the contrary might need more cache blocks, and may rise cache line conflicts depending on how the chunks physically ended up on the heap.
The usual way of implementing a 2 dimensional array in C++
would be to wrap it in a class, using std::vector<int>
, and
have class accessors which calculate the index. However:
Any questions concerning optimization can only be answered by measuring, and even then, they are only valid for the compiler you are using, on the machine on which you do the measurements.
If you write:
int array[2][3] = { ... };
and then something like:
for ( int i = 0; i != 2; ++ i ) {
for ( int j = 0; j != 3; ++ j ) {
// do something with array[i][j]...
}
}
It's hard to imagine a compiler which doesn't actually generate something along the lines of:
for ( int* p = array, p != array + whatever; ++ p ) {
// do something with *p
}
This is one of the most fundamental optimizations around, and has been for at least 30 years.
If you dynamically allocate as you propose, the compiler will not be able to apply this optimization. And even for a single access: the matrix has poorer locality, and requires more memory accesses, so would likely be less performant.
If you're in C++, you would normally write a Matrix
class,
using std::vector<int>
for the memory, and calculating the
indexes explicitly using multiplication. (The improved locality
will probably result in better performance, despite the
multiplication.) This could make it more difficult for the
compiler to do the above optimization, but if this turns out to
be an issue, you can always provide specialized iterators for
handling this one particular case. You end up with more
readable and more flexible code (e.g. the dimensions don't have
to be constant), at little or no loss of performance.
As to which choice provides better performance, then the answer will largely depend on your specific circumstances. The only way to know if one way is better or if they are roughly equivalent is to measure the performance of your application.
Some things that would be a factor are: how often you do it, the actual size of the arrays/data, how much memory your system has, and how well your system manages memory.
If you have the luxury of being able to choose between the two choices, it must mean the sizes are already nailed up. Then, you do not need the multiple allocation scheme that you illustrated. You can perform a single dynamic allocation of your 2D array. In C:
int (*array)[COLUMNS];
array = malloc(ROWS * sizeof(*array));
In C++:
std::vector<std::array<int, COLUMNS>> array(ROWS);
As long as the COLUMNS
is nailed down, you can perform a single allocation to obtain your 2D array. If neither are nailed down, then you don't really have the choice of using a static array anyway.