问题
How does the linker find the main function in an x86-64 ELF-format executable?
回答1:
A very generic overview, the linker assigns the address to the block of code identified by the symbol main
. As it does for all the symbols in your object files.
Actually, it doesn't assign a real address but assigns an address relative to some base which will get translated to a real address by the loader when the program is executed.
The actual entry point is not likely main
but some symbol in the crt that calls main. LD by default looks for the symbol start
unless you specify something different.
The linked code ends up in the .text
section of the executable and could look something like this (very simplified):
Address | Code
1000 someFunction
...
2000 start
2001 call 3000
...
3000 main
...
When the linker writes the ELF header it would specify the entry point as address 2000.
You can get the relative address of main
by dumping the contents of the executable with something like objdump
. To get the actual address at runtime you can just read the symbol funcptr ptr = main;
where funcptr
is defined as a pointer to a function with the signature of main
.
typedef int (*funcptr)(int argc, char* argv[]);
int main(int argc, char* argv[])
{
funcptr ptr = main;
printf("%p\n", ptr);
return 0;
}
The address of main will be correctly resolved regardless if symbols have been stripped since the linker will first resolve the symbol main
to its relative address.
Use objdump like this:
$ objdump -f funcptr.exe
funcptr.exe: file format pei-i386
architecture: i386, flags 0x0000013a:
EXEC_P, HAS_DEBUG, HAS_SYMS, HAS_LOCALS, D_PAGED
start address 0x00401000
Looking for main
specifically, on my machine I get this:
$ objdump -D funcptr.exe | grep main
40102c: e8 af 01 00 00 call 4011e0 <_cygwin_premain0>
401048: e8 a3 01 00 00 call 4011f0 <_cygwin_premain1>
401064: e8 97 01 00 00 call 401200 <_cygwin_premain2>
401080: e8 8b 01 00 00 call 401210 <_cygwin_premain3>
00401170 <_main>:
401179: e8 a2 00 00 00 call 401220 <___main>
004011e0 <_cygwin_premain0>:
004011f0 <_cygwin_premain1>:
00401200 <_cygwin_premain2>:
00401210 <_cygwin_premain3>:
00401220 <___main>:
Note that I am on Windows using Cygwin so your results will differ slightly. It looks like main
lives at 00401170
for me.
回答2:
On Binutils, it is determined by either:
-e
CLI option- linker script
You can view your linker script with:
ld --verbose
Mine contains:
ENTRY(_start)
Then at link time, glibc provided object files like crt1.o
that contain the _start
symbol are passed to the linker together with your main.o
.
Those object files do some setup for you like argv
, and then call your main
function.
You can see those extra object files being sneaked in with gcc -v
.
This is documented at: https://sourceware.org/binutils/docs/ld/Entry-Point.html#Entry-Point
The first instruction to execute in a program is called the entry point. You can use the ENTRY linker script command to set the entry point. The argument is a symbol name:
ENTRY(symbol)
There are several ways to set the entry point. The linker will set the entry point by trying each of the following methods in order, and stopping when one of them succeeds:
- the `-e' entry command-line option;
- the ENTRY(symbol) command in a linker script;
- the value of a target specific symbol, if it is defined; For many targets this is start, but PE and BeOS based systems for example check a list of possible entry symbols, matching the first one found.
- the address of the first byte of the `.text' section, if present;
- The address 0.
See also: is there a GCC compiler/linker option to change the name of main?
来源:https://stackoverflow.com/questions/17708649/how-does-the-linker-find-the-main-function