问题
[bits 32]
global _start
section .data
str_hello db "HelloWorld", 0xa
str_hello_length db $-str_hello
section .text
_start:
mov ebx, 1 ; stdout file descriptor
mov ecx, str_hello ; pointer to string of characters that will be displayed
mov edx, [str_hello_length] ; count outputs Relative addressing
mov eax, 4 ; sys_write
int 0x80 ; linux kernel system call
mov ebx, 0 ; exit status zero
mov eax, 1 ; sys_exit
int 0x80 ; linux kernel system call
The fundamental thing here is that I need to have the length of the hello string to pass to linux's sys_write system call. Now, I'm well aware that I can just use EQU and it'll work fine, but I'm really trying to understand what's going on here.
So, basically when I use EQU it loads the value and that's fine.
str_hello_length equ $-str_hello
...
...
mov edx, str_hello_length
However, if I use this line with DB
str_hello_length db $-str_hello
...
...
mov edx, [str_hello_length] ; of course, without the brackets it'll load the address, which I don't want. I want the value stored at that address
instead of loading the value at that address like I expect it to, the assembler outputs RIP-Relative Addressing, as shown in the gdb debugger and I'm simply just wondering why.
mov 0x6000e5(%rip),%edx # 0xa001a5
Now, I've tried using the eax register instead(and then moving eax to edx), but then I get a different problem. I end up getting a segmentation fault as noted in gdb:
movabs 0x4b8c289006000e5,%eax
so apparently, different registers produce different code. I guess I need to truncate the upper 32-bits somehow , but I don't know how to do that.
Though did kind of found a 'solution' and it goes like this: load eax with str_hello_length's address and then load the contents of address that eax points to and everything is hunky dory.
mov eax, str_hello_length
mov edx, [eax] ; count
; gdb disassembly
mov $0x6000e5,%eax
mov (%rax),%edx
apparently trying to indirectly load a value from a mem address produces different code? I don't really know.
I just need help in understanding the syntax and operations of these instructions, so I can better understand why how to load effective addresses. Yeah, I guess I could've just switched to EQU and be on my merry way, but I really feel I can't go on until I understand what's going on with the DB declaration and loading from it's address.
回答1:
The answer is it isn't. x86-64 doesn't have RIP-relative addressing in 32-bit emulation mode (this should be obvious because RIP doesn't exist in 32-bit). What's happening is that nasm is compiling you some lovely 32-bit opcodes that you're trying to run as 64-bit. GDB is disassembling your 32-bit opcodes as 64-bit, and telling you that in 64-bit, those bytes mean a RIP-relative mov. 64-bit and 32-bit opcodes on the x86-64 overlap a lot to make use of common decoding logic in the silicon, and you're getting confused because the code that GDB is disassembling looks similar to the 32-bit code you wrote, but in reality you're just throwing garbage bytes at the processor.
This isn't anything to do with nasm. You're using the wrong architecture for the process you're in. Either use 32-bit nasm in a 32-bit process or compile your assembly code for [BITS 64].
回答2:
You're asking the assembler to target 32-bit mode (with bits 32
), but you're putting that 32-bit machine code into a 64-bit object file and then looking at what happens when you disassemble it as x86-64 machine code.
So you're seeing the differences between instruction encoding in x86-32 and x86-64. i.e. This is what happens when you decode 32-bit machine code as 64-bit.
mov 0x6000e5(%rip),%edx # 0xa001a5
The key one in this case being that 32-bit x86 has two redundant ways to encode a 32-bit absolute address (with no registers): with or without a SIB byte. 32-bit mode doesn't have RIP-relative (or EIP-relative) addressing.
x86-64 repurposed the shorter (ModR/M + disp32
) form as the RIP-relative addressing mode, while 32-bit absolute addressing is still available with the longer ModR/M + SIB + disp32
encoding. (With a SIB byte that encodes no base register and no index register, of course).
Note that the offset from RIP is actually the absolute static address where your data is placed (in 64-bit code), 0x6000e5
.
The comment is the disassembler showing you the effective absolute address; RIP-relative addressing counts from the byte after the instruction, i.e. the start of the next instruction.
movabs 0x4b8c289006000e5,%eax
When the destination register is EAX, your assembler (in 32-bit mode) chooses the shorter mov
encoding that loads eax
from a 32-bit absolute address with no ModR/M byte, just A1 disp32
. Intel's manual calls this a moffs (memory offset) instead of an effective address.
In x86-64 mode, that opcode takes a 64-bit absolute address. (And is unique in being able to load/store from a 64-bit absolute (not RIP-relative) address without getting the address into a register first). Thus, decoding consumes part of the next instruction as part of the 64-bit address, and that's where some of those high bytes in the address come from. The 0x6000e5
in the low 32 bits is correct, and is how it would decode as 32-bit machine code.
Changed
[bits 32]
to[bit 64]
See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.
Better to build a 32-bit executable if you aren't going to use native 64-bit system calls. Use nasm -felf32
, and link with gcc -m32 -nostdlib -static
.
回答3:
The issue is probably that the offset of str_hello_length
is greater than 32 bits. IA-32 doesn't support displacements of greater than 32 bits. The way around that is to use RIP-relative addressing, under the (often correct) assumption that the distance between the RIP and the address you're trying to reach fits in 32 bits. In this case, the base is RIP
and the index is the instruction length, so if the instruction already has a base or an index, RIP-Relative can't be used.
Let's examine your various attempts:
str_hello_length equ $-str_hello
...
...
mov edx, str_hello_length
There's no memory access here, only a simply move with an immediate, so there's no addressing at all.
Next:
mov eax, str_hello_length
mov edx, [eax] ; count
Now the first instruction is a move with an immediate, which is still not a memory access. The second instruction has a memory access, but it uses eax
as a base, and there's no displacement. RIP-relative is only relevant when there's a displacement, so there's no RIP-relative here.
Finally:
str_hello_length db $-str_hello
...
...
mov edx, [str_hello_length] ; of course, without the brackets it'll load the address, which I don't want. I want the value stored at that address
Here you're using str_hello_length
as your displacement. As I explained above, this will result in RIP-Relative addressing.
来源:https://stackoverflow.com/questions/9989593/nasm-x86-64-assembly-in-32-bit-mode-why-does-this-instruction-produce-rip-relat