I would really appreciate if somebody could tell me why compilation of this program:
double data[123456789];
int main() {}
takes 10 times
Dynamic Allocation
Your second program allocates memory at run-time; from the compiler's perspective, there is no real difference between compiling any of the following:
double *data = new double[123456789];
double *data = malloc(123456789);
double data = sqrt(123456789);
They all do different things, but all the compiler needs to do is generate a call to an external function with a fixed argument. You can see this if you use g++ -S
to generate assembly:
.text
main:
subq $8, %rsp /* Allocate stack space. */
movl $987654312, %edi /* Load value "123456789 * 8" as argument. */
call _Znam /* Call the allocation function. */
xorl %eax, %eax /* Return 0. */
addq $8, %rsp /* Deallocate stack space. */
ret
This is simple for any compiler to generate, and any linker to link.
Static Allocation
Your first program is a little tricker, though, as you have noticed. If we have a look at the assembly for it, we see something different going on:
.text
main:
xorl %eax, %eax /* Return 0. */
ret
.bss
data:
.zero 987654312 /* Reserve "123456789 * 8" bytes of space. */
The generated assembly asks for 123456789 * sizeof(double)
bytes of space to be reserved when the program is first started. When this is assembled and later linked (which happens behind the scenes is you just run g++ foo.c
), the linker ld
will actually allocate all of this reserved space in memory. This is where the time is going. If you run top
while g++
is running, you will see ld
suck up a large amount of your system's memory.
Reduced Executable Sizes
A reasonable question might be "Why doesn't my executable end up really large, if the memory is being reserved at link-time?". The answer is hidden in the .bss
marker in the assembly. This tells the linker that the data defined below it should not be stored in the final executable, but instead allocated to zero at run-time.
This leaves us with the follow series of steps:
The assembler tells the linker that it needs to create a section of memory that is 1GB long.
The linker goes ahead and allocates this memory, in preparation for placing it in the final executable.
The linker realizes that this memory is in the .bss
section and is marked NOBITS
, meaning that the data is just 0, and doesn't need to be physically placed into the final executable. It avoids writing out the 1GB of data, instead just throwing the allocated memory away.
The linker writes out to the final ELF file just the compiled code, producing a small executable.
A smarter linker might be able to avoid steps 2 and 3 above, making your compile time much faster. In reality, scenarios such as yours don't come up often enough in practice to make such an optimization worthwhile.
Dynamic versus Static Allocation
If you are trying to work out which of the above (dynamic versus static allocation) to actually use in your program, here are a few thoughts:
The linker will need to use as much memory as your final program (plus a bit). If you want to statically allocate 4GB of RAM, you will need 4GB of RAM for your linker. This isn't implicit in how linkers work, but rather just seems to be just the way they are implemented.
When allocating large amounts of memory, doing it dynamically allows you to do better error handling (displaying a user-friendly message on screen explaining you don't have enough memory, instead of just failing to load the executable with a message from the OS going to the user);
Dynamically allocating memory allows you to chose how much memory to allocate based on the your actual needs. (If data
is a cache, the user can select the cache size, if it is storing intermediate results, you can size it based on the problem, etc.)
Dynamically allocating memory allows you to free it later on, if your program needs to go on and do more work after it is finished with the memory.
In the end, if the above points don't matter and you can deal with longer compile times, it probably doesn't matter too much. Statically allocating memory can be much simpler, and is often the right approach for smaller amounts of memory or throw-away applications.