Here is a test reproducing the problem:
$ echo \"void whatever() {}\" > prog.c
$ gcc prog.c
This produces the following error on GCC 4.8.4:<
main
is not the start of program as you originally thought, c lib has a starter program (crt1.o
) which has a _start
program which will invoke our main
and do cleaning job after main
ELF has two headers, as following shows:
Here we only focus on section header structure:
mapping<var_name, offset, size...>
// and special cases
mapping<external_var_name, offset, size...>
Every program is compiled separately, which means address allocation is similar (In the early version of linux, every compiled program start with same virtual address -- 0x08000000
, and many attacks can make use of this, so it changes to adding some random delta to address to alleviate the problem), so there may exists some overlay area. This is why address relocation needed.
The relocation info (offset, value etc) is stored in .rel.*
section:
Relocation section '.rel.text' at offset 0x7a4 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
0000000d 00000e02 R_386_PC32 00000000 main
00000015 00000f02 R_386_PC32 00000000 exit
Relocation section '.rel.debug_info' at offset 0x7b4 contains 43 entries:
Offset Info Type Sym.Value Sym. Name
00000006 00000601 R_386_32 00000000 .debug_abbrev
0000000c 00000901 R_386_32 00000000 .debug_str
When the linker want to set the address of main
in the process of relocation, it can't find a symbol in your compiled elf file, so it complains that and stop the linking process.
Here is the simplified version of os implementations, start.c corresponds to crt1.o
's source code:
int entry(char *); // corresponds to main
void _start(char *args) {
entry(args);
exit();
}
We can break this down into two parts:
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
This is simply because the C Runtime library (crt1.o
) is trying to call your (missing) main()
function. There is a good overview of the various C Runtime files here / here.
Note: This has been a bit of mission (learning opportunity) for me - I've taken a few days to research and understand in the limited free time that I have. It's something that I've wondered about for a long time but never looked into... Hopefully my understanding is correct (though clearly not complete - I'll update it if I can).
This is a little more complex, and hidden away.
Just by reading the message, we can gleen that it's related to debug information (the .debug_info
and .debug_line
sections of crt1.o
's debug file is mentioned). Note the /usr/lib/debug/
path, which just contains debug information, the other crt1.o
is a "stripped" file...
The format string is found in the binutils
project, specifically bfd/elfcode.h
. BFD being Binary File Descriptor - the GNU way to handle object files across a number of system architectures.
BFD is an intermediate format used for binary files - GCC will employ BFD before it finally writes an a.out, ELF, or other binary.
Looking into the manual, we can find some interesting snippets of knowledge:
[...] with each entry in the hash table the a.out linker keeps the index the symbol has in the final output file (this index number is used so that when doing a relocateable link the symbol index used in the output file can be quickly filled in when copying over a reloc). [source]
The standard records contain only an address, a symbol index, and a type field. [source]
This means that these errors are issued due to relocations that are related to specific (missing?) 'symbols'. A 'symbol' in this context is any named 'thing' - e.g: functions and variables.
As these 'invalid symbols' appear to be resolved by simply declaring main()
, I would guess that some (all?) of these symbol indexes are derived from main()
, its debug information, and/or relations.
I can't tell you what is supposed to be at the symbol indexes mentioned (2, 11, 12, 13, 21), but it is interesting that my tests yielded the same list of symbol indexes.
Running ld
with crt1.o
alone gives us similar output:
$ ld /usr/lib/x86_64-linux-gnu/crt1.o
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 0 has invalid symbol index 11
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 1 has invalid symbol index 12
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 2 has invalid symbol index 2
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 3 has invalid symbol index 2
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 4 has invalid symbol index 11
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 5 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 6 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 7 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 8 has invalid symbol index 12
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 9 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 10 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 11 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 12 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 13 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 14 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 15 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 16 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 17 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 18 has invalid symbol index 13
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 19 has invalid symbol index 21
ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_line): relocation 0 has invalid symbol index 2
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x12): undefined reference to `__libc_csu_fini'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x19): undefined reference to `__libc_csu_init'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x25): undefined reference to `__libc_start_main'
Instead of compiling the code directly, go through all the stages of compilation to figure out where the error is arising (as far as I know, such errors occur during linking). Following gcc
arguments will be helpful:
-E
Preprocess only; do not compile, assemble or link-S
Compile only; do not assemble or link-c
Compile and assemble, but do not linkNow:
gcc -E prog.c
gcc -S prog.c
gcc -c prog.c
With the program/code you have mentioned, all these steps are working perfect with gcc 4.8.4. But, during linking, when you compile using gcc prog.c
, the compiler is unable to link with respective library, as it was not mentioned. Also, we have no main
function in the prog.c
file. So, we need to indicate -nostartfiles
switch. Hence, you can compile prog.c
as:
gcc prog.c -lc -nostartfiles
This produces the warning:
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 00000000004002a3
This is because of the sequence. i.e., init
calls _start
function and the _start
function calls main
function. This warning means that the _start
function is unable to locate main
function, where the init
call is unable to locate _start
. Please note that this is just a warning. In order to avoid this warning, we need to change the command to compile without warnings as follows.
gcc prog.c -lc --entry whatever -nostartfiles
With this command, we are instructing the kernel to compile prog.c
using gcc
by linking the libc.so
library with the starting point as the function whatever
, where this code contains no main
function.
This is the context with gcc 4.8.4, which I've compiled on.
Coming to the case of gcc 6.2.0, I think all these linking stuff is taken care by the compiler itself. Hence, you can simply mention the compiling command as shown below.
gcc -c prog.c -nostartfiles
If it produces any other errors or warnings, you can use the switches mentioned above.
Also, note that crt0
through crtN
(N
depends on the ELF file) are executed before the init
calls _start
, which gives the metadata about the memory and other machine dependent parameters. The linking errors shown as
/usr/bin/ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 0 has invalid symbol index 11
do not provide complete information for rectifying the issue, as machines are not as smart as human beings in identifying the point of error.
This produces a complete executable file. Please do notice that such code (without main function) is used when we are working on libraries/modules within a project.
All the data provided is done with step-by-step analysis. You can recreate all the steps mentioned. Hope this cleared your doubt. Good day!