问题
I have disassembled a C program with Radare2. Inside this program there are many calls to scanf
like the following:
0x000011fe 488d4594 lea rax, [var_6ch]
0x00001202 4889c6 mov rsi, rax
0x00001205 488d3df35603. lea rdi, [0x000368ff] ; "%d" ; const char *format
0x0000120c b800000000 mov eax, 0
0x00001211 e86afeffff call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x00001216 8b4594 mov eax, dword [var_6ch]
0x00001219 83f801 cmp eax, 1 ; rsi ; "ELF\x02\x01\x01"
0x0000121c 740a je 0x1228
Here scanf
has the address of the string "%d"
passed to it from the line lea rdi, [0x000368ff]
. I'm assuming 0x000368ff
is the location of "%d"
in the exectable file because if I restart Radare2 in debugging mode (r2 -d ./exec
) then lea rdi, [0x000368ff]
is replaced by lea rdi, [someMemoryAddress]
.
If lea rdi, [0x000368ff]
is whats hard coded in the file then how does the instruction change to the actual memory address when run?
回答1:
Radare is tricking you, what you see is not the real instruction, it has been simplified for you.
The real instruction is:
0x00001205 488d3df3560300 lea rdi, qword [rip + 0x356f3]
0x0000120c b800000000 mov eax, 0
This is a typical position independent lea. The string to use is stored in your binary at the offset 0x000368ff
, but since the executable is position independent, the real address needs to be calculated at runtime. Since the next instruction is at offset 0x0000120c
, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x000368ff - 0x0000120c)
= rip + 0x356f3
, which is what you see above.
When doing static analysis, since Radare does not know the base address of the binary in memory, it simply calculates 0x0000120c + 0x356f3
= 0x000368ff
. This makes reverse engineering easier, but can be confusing since the real instruction is different.
As an example, the following program:
int main(void) {
puts("Hello world!");
}
When compiled produces:
6b4: 48 8d 3d 99 00 00 00 lea rdi,[rip+0x99]
6bb: e8 a0 fe ff ff call 560 <puts@plt>
So rip + 0x99
= 0x6bb + 0x99
= 0x754
, and if we take a look at offset 0x754
in the binary with hd
:
$ hd -s 0x754 -n 16 a.out
00000754 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 00 00 00 |Hello world!....|
00000764
回答2:
The full instruction is
48 8d 3d f3 56 03 00
This instruction is literally
lea rdi, [rip + 0x000356f3]
with a rip
relative addressing mode. The instruction pointer rip
has the value 0x0000120c
when the instruction is executed, thus rdi
receives the desired value 0x000368ff
.
If this is not the real address, it is possible that your program is a position-independent executable (PIE) which is subject to relocation. Since the address is encoded using a rip-relative addressing mode, no relocation is needed and the address is correct, regardless of where the binary is loaded.
来源:https://stackoverflow.com/questions/57748994/how-does-this-program-know-the-exact-location-where-this-string-is-stored