问题
I know that
int 0x80
is making interrupt in linux. But, I don't understand how this code works. Does it returning something?What
$ - msg
standing for?
global _start
section .data
msg db "Hello, world!", 0x0a
len equ $ - msg
section .text
_start:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, len
int 0x80 ;What is this?
mov eax, 1
mov ebx, 0
int 0x80 ;and what is this?
回答1:
How does $ work in NASM, exactly? explains how $ - msg
gets NASM to calculate the string length as an assemble-time constant for you, instead of hard-coding it.
I originally wrote the rest of this for SO Docs (topic ID: 1164, example ID: 19078), rewriting a basic less-well-commented example by @runner. This looks like a better place to put it than as part of my answer to another question where I had previously moved it after the SO docs experiment ended.
Making a system call is done by putting arguments into registers, then running int 0x80
(32-bit mode) or syscall
(64-bit mode). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 and The Definitive Guide to Linux System Calls.
Think of int 0x80
as a way to "call" into the kernel, across the user/kernel privilege boundary. The kernel does stuff according to the values that were in registers when int 0x80
executed, then eventually returns. The return value is in EAX.
When execution reaches the kernel's entry point, it looks at EAX and dispatches to the right system call based on the call number in EAX. Values from other registers are passed as function args to the kernel's handler for that system call. (e.g. eax=4 / int 0x80
will get the kernel to call its sys_write
kernel function, implementing the POSIX write
system call.)
And see also What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - that answer includes a look at the asm in the kernel entry point that is "called" by int 0x80
. (Also applies to 32-bit user-space, not just 64-bit where you shouldn't use int 0x80
).
If you don't already know low-level Unix systems programming, you might want to just write functions in asm that take args and return a value (or update arrays via a pointer arg) and call them from C or C++ programs. Then you can just worry about learning how to handle registers and memory, without also learning the POSIX system-call API and the ABI for using it. That also makes it very easy to compare your code with compiler output for a C implementation. Compilers usually do a pretty good job at making efficient code, but are rarely perfect.
libc provides wrapper functions for system calls, so compiler-generated code would call write
rather than invoking it directly with int 0x80
(or if you care about performance, sysenter
). (In x86-64 code, use syscall for the 64-bit ABI.) See also syscalls(2).
System calls are documented in section 2 manual pages, like write(2). See the NOTES section for differences between the libc wrapper function and the underlying Linux system call. Note that the wrapper for sys_exit
is _exit(2), not the exit(3) ISO C function that flushes stdio buffers and other cleanup first. There's also an exit_group
system call that ends all threads. exit(3)
actually uses that, because there's no downside in a single-threaded process.
This code makes 2 system calls:
- sys_write(1, "Hello, World!\n", sizeof(...));
- sys_exit(0);
I commented it heavily (to the point where it it's starting to obscure the actual code without color syntax highlighting). This is an attempt to point things out to total beginners, not how you should comment your code normally.
section .text ; Executable code goes in the .text section
global _start ; The linker looks for this symbol to set the process entry point, so execution start here
;;;a name followed by a colon defines a symbol. The global _start directive modifies it so it's a global symbol, not just one that we can CALL or JMP to from inside the asm.
;;; note that _start isn't really a "function". You can't return from it, and the kernel passes argc, argv, and env differently than main() would expect.
_start:
;;; write(1, msg, len);
; Start by moving the arguments into registers, where the kernel will look for them
mov edx,len ; 3rd arg goes in edx: buffer length
mov ecx,msg ; 2nd arg goes in ecx: pointer to the buffer
;Set output to stdout (goes to your terminal, or wherever you redirect or pipe)
mov ebx,1 ; 1st arg goes in ebx: Unix file descriptor. 1 = stdout, which is normally connected to the terminal.
mov eax,4 ; system call number (from SYS_write / __NR_write from unistd_32.h).
int 0x80 ; generate an interrupt, activating the kernel's system-call handling code. 64-bit code uses a different instruction, different registers, and different call numbers.
;; eax = return value, all other registers unchanged.
;;;Second, exit the process. There's nothing to return to, so we can't use a ret instruction (like we could if this was main() or any function with a caller)
;;; If we don't exit, execution continues into whatever bytes are next in the memory page,
;;; typically leading to a segmentation fault because the padding 00 00 decodes to add [eax],al.
;;; _exit(0);
xor ebx,ebx ; first arg = exit status = 0. (will be truncated to 8 bits). Zeroing registers is a special case on x86, and mov ebx,0 would be less efficient.
;; leaving out the zeroing of ebx would mean we exit(1), i.e. with an error status, since ebx still holds 1 from earlier.
mov eax,1 ; put __NR_exit into eax
int 0x80 ;Execute the Linux function
section .rodata ; Section for read-only constants
;; msg is a label, and in this context doesn't need to be msg:. It could be on a separate line.
;; db = Data Bytes: assemble some literal bytes into the output file.
msg db 'Hello, world!',0xa ; ASCII string constant plus a newline (0x10)
;; No terminating zero byte is needed, because we're using write(), which takes a buffer + length instead of an implicit-length string.
;; To make this a C string that we could pass to puts or strlen, we'd need a terminating 0 byte. (e.g. "...", 0x10, 0)
len equ $ - msg ; Define an assemble-time constant (not stored by itself in the output file, but will appear as an immediate operand in insns that use it)
; Calculate len = string length. subtract the address of the start
; of the string from the current position ($)
;; equivalently, we could have put a str_end: label after the string and done len equ str_end - str
Notice that we don't store the string length in data memory anywhere. It's an assemble-time constant, so it's more efficient to have it as an immediate operand than a load. We could also have pushed the string data onto the stack with three push imm32
instructions, but bloating the code-size too much isn't a good thing.
On Linux, you can save this file as Hello.asm
and build a 32-bit executable from it with these commands:
nasm -felf32 Hello.asm # assemble as 32-bit code. Add -Worphan-labels -g -Fdwarf for debug symbols and warnings
gcc -static -nostdlib -m32 Hello.o -o Hello # link without CRT startup code or libc, making a static binary
See this answer for more details on building assembly into 32 or 64-bit static or dynamically linked Linux executables, for NASM/YASM syntax or GNU AT&T syntax with GNU as
directives. (Key point: make sure to use -m32
or equivalent when building 32-bit code on a 64-bit host, or you will have confusing problems at run-time.)
You can trace its execution with strace
to see the system calls it makes:
$ strace ./Hello
execve("./Hello", ["./Hello"], [/* 72 vars */]) = 0
[ Process PID=4019 runs in 32 bit mode. ]
write(1, "Hello, world!\n", 14Hello, world!
) = 14
_exit(0) = ?
+++ exited with 0 +++
Compare this with the trace for a dynamically linked process (like gcc makes from hello.c, or from running strace /bin/ls
) to get an idea just how much stuff happens under the hood for dynamic linking and C library startup.
The trace on stderr and the regular output on stdout are both going to the terminal here, so they interfere in the line with the write
system call. Redirect or trace to a file if you care. Notice how this lets us easily see the syscall return values without having to add code to print them, and is actually even easier than using a regular debugger (like gdb) to single-step and look at eax
for this. See the bottom of the x86 tag wiki for gdb asm tips. (The rest of the tag wiki is full of links to good resources.)
The x86-64 version of this program would be extremely similar, passing the same args to the same system calls, just in different registers and with syscall
instead of int 0x80
. See the bottom of What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for a working example of writing a string and exiting in 64-bit code.
related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.
来源:https://stackoverflow.com/questions/61519222/hello-world-in-assembly-language-with-linux-system-calls