How does this program know the exact location where this string is stored?

问题

I have disassembled a C program with Radare2. Inside this program there are many calls to scanf like the following:

0x000011fe      488d4594       lea rax, [var_6ch]
0x00001202      4889c6         mov rsi, rax
0x00001205      488d3df35603.  lea rdi, [0x000368ff]       ; "%d" ; const char *format
0x0000120c      b800000000     mov eax, 0
0x00001211      e86afeffff     call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x00001216      8b4594         mov eax, dword [var_6ch]
0x00001219      83f801         cmp eax, 1                  ; rsi ; "ELF\x02\x01\x01"
0x0000121c      740a           je 0x1228

Here scanf has the address of the string "%d" passed to it from the line lea rdi, [0x000368ff]. I'm assuming 0x000368ff is the location of "%d" in the exectable file because if I restart Radare2 in debugging mode (r2 -d ./exec) then lea rdi, [0x000368ff] is replaced by lea rdi, [someMemoryAddress].

If lea rdi, [0x000368ff] is whats hard coded in the file then how does the instruction change to the actual memory address when run?

回答1:

Radare is tricking you, what you see is not the real instruction, it has been simplified for you.

The real instruction is:

0x00001205    488d3df3560300    lea rdi, qword [rip + 0x356f3]
0x0000120c    b800000000        mov eax, 0

This is a typical position independent lea. The string to use is stored in your binary at the offset 0x000368ff, but since the executable is position independent, the real address needs to be calculated at runtime. Since the next instruction is at offset 0x0000120c, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x000368ff - 0x0000120c) = rip + 0x356f3, which is what you see above.

When doing static analysis, since Radare does not know the base address of the binary in memory, it simply calculates 0x0000120c + 0x356f3 = 0x000368ff. This makes reverse engineering easier, but can be confusing since the real instruction is different.

As an example, the following program:

int main(void) {
    puts("Hello world!");
}

When compiled produces:

  6b4:   48 8d 3d 99 00 00 00    lea    rdi,[rip+0x99] 
  6bb:   e8 a0 fe ff ff          call   560 <puts@plt>

So rip + 0x99 = 0x6bb + 0x99 = 0x754, and if we take a look at offset 0x754 in the binary with hd:

$ hd -s 0x754 -n 16 a.out
00000754  48 65 6c 6c 6f 20 77 6f  72 6c 64 21 00 00 00 00  |Hello world!....|
00000764

回答2:

The full instruction is

48 8d 3d f3 56 03 00

This instruction is literally

lea rdi, [rip + 0x000356f3]

with a rip relative addressing mode. The instruction pointer rip has the value 0x0000120c when the instruction is executed, thus rdi receives the desired value 0x000368ff.

If this is not the real address, it is possible that your program is a position-independent executable (PIE) which is subject to relocation. Since the address is encoded using a rip-relative addressing mode, no relocation is needed and the address is correct, regardless of where the binary is loaded.

来源：https://stackoverflow.com/questions/57748994/how-does-this-program-know-the-exact-location-where-this-string-is-stored

标签

assembly

x86-64

memory-address