问题
From my textbook:
To produce the binary version of each instruction in the assembly language program, the assembler must determine the addresses corresponding to all labels. Assemblers keep track of labels used in branches and data transfer instructions in a symbol table. As you might expect, the table contains pairs of symbols and addresses.
Why does it need a symbol table? If we have a symbol table with a label name and an address, what is the use of the address? What is at the address... just the name of the label? Or is it the instructions of the label?
Say we have an instruction like this in assembly MIPS:
add_numbers:
addi, $s0, $t0, 2
Why wouldn't the symbol table just store add_numbers | <the_binary_representation_of_the_instruction>
instead of add_numbers | <address_location_of_label>
?
回答1:
A label IS an address, it is a way for programmers to provide an address to the assembler but not have to know the physical address. Let the toolchain do that work for you.
I dont remember my MIPS off hand so here is some pseudo code.
loop_top:
nop
nop
sub r0,1
cmp r0,0
bne loop_top
Depending on the instruction set, but in general the conditional branch will be pc-relative. Tables in general used during assembly with one or more passes on the table will resolve the distance between the branch and the destination so that the branch can be encoded completely. Most instruction sets the above can be resolved in one pass. loop_top is a label that will have an address, but for the branch here it is pc-relative and you dont need to know the physical address.
But
call my_fun
once making a pass on the code, the assembler finds that my_fun is not defined in this file and/or the assembly language has some syntax to mark it as external before used. Either way it is external. Cannot be resolved at the time this file is assembled. So tables are required indicating the label name, and where in this object that instruction lives, depending on the assembler it may fill in the temporary offset or full address as zero for now or encode it as an infinite loop. The linker later determines the actual address for things in the processors memory space, the linker will ultimately have a table of all (relevant labels at this phase of the toolchain) labels and their addresses while linking, then the linker will go back into the code and repair/create the machine code for this call instruction now that it knows what the actual address is for that label.
j hello
the object:
Disassembly of section .text:
00000000 <.text>:
0: 08000000 j 0x0
4: 00000000 nop
another object:
.globl hello
hello:
j hello
.word hello
link them
Disassembly of section .text:
00001000 <_ftext>:
1000: 08000402 j 1008 <hello>
1004: 00000000 nop
00001008 <hello>:
1008: 08000402 j 1008 <hello>
100c: 00000000 nop
1010: 00001008 0x1008
As objects all the toolchain has to go on is the label hello being used as an address to be resolved later. In this case at link time, the linker works through the objects, counting bytes making a table of labels and their addresses. During the first or some other pass it will change the instructions or data as needed to resolve these labels.
Now old school assemblers that did the job of assembling and linking from the same source file, the statement "assembler must determine the addresses corresponding to all labels". It is not the assembler in general with commonly used toolchains that does the linker work. So that quoted statement could use some improvement. But hopefully this demonstrates that labels are addresses, they represent a yet to be determined address so the code is easier to write than something like this
nop
nop
j pc-2
then if you add another instruction
nop
add r0,r1
nop
j pc-3
Or
j 0x1008
then have to spend a significant amount of time re-writing the program to get each and every address hardcoded into the program. Add/remove a single line and a lot of other code has to be changed. Labels representing addresses make that all significantly easier and the toolchain determines addresses, then goes back and replaces the labels with addresses basically...
Added a nop:
Disassembly of section .text:
00001000 <_ftext>:
1000: 08000403 j 100c <hello>
1004: 00000000 nop
1008: 00000000 nop
0000100c <hello>:
100c: 08000403 j 100c <hello>
1010: 00000000 nop
1014: 0000100c
If we didnt have labels and had to hardcode the address instead then you would have to change those three places as a result of the nop. One line. If you added dozens of lines, hundreds. How would you keep track of it all? By putting labels in comments? assemble and disassemble and patch up the source over and over again until it looked somewhat right and hope for no bugs.
mips-elf-readelf -s so.elf
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00001000 0 SECTION LOCAL DEFAULT 1
2: 00400000 0 SECTION LOCAL DEFAULT 2
3: 00400018 0 SECTION LOCAL DEFAULT 3
4: 00000000 0 SECTION LOCAL DEFAULT 4
5: 0000a010 0 NOTYPE LOCAL DEFAULT 2 _gp
6: 00002018 0 NOTYPE GLOBAL DEFAULT 4 _fdata
7: 0000100c 0 OBJECT GLOBAL DEFAULT 1 hello
8: 00001000 0 NOTYPE GLOBAL DEFAULT 1 _ftext
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND _start
10: 00002018 0 NOTYPE GLOBAL DEFAULT 2 __bss_start
11: 00002018 0 NOTYPE GLOBAL DEFAULT 2 _edata
12: 00002018 0 NOTYPE GLOBAL DEFAULT 2 _end
13: 00002018 0 NOTYPE GLOBAL DEFAULT 2 _fbss
and here is the one of interest:
7: 0000100c 0 OBJECT GLOBAL DEFAULT 1 hello
the label hello once assembled and linked into a final binary is equal to address 0x100C
来源:https://stackoverflow.com/questions/57984652/what-is-the-purpose-of-the-assembler-and-symbol-table-what-is-at-a-symbols-add