Difference in ABI between x86_64 Linux functions and syscalls

人盡茶涼 提交于 2019-12-05 01:28:04
Shachar Shemesh

The syscall instruction is intended to provide a quicker method of entering Ring-0 in order to carry out a system call. This is meant to be an improvement over the old method, which was to raise a software interrupt (int 0x80 on Linux).

Part of the reason the instruction is faster is because it does not change memory, or even change rsp to point at a kernel stack. Unlike a software interrupt, where the CPU is forced to allow the OS to resume operation without clobbering anything, for this command the CPU is allowed to assume the software is aware that something is happening here.

In particular, syscall stores two parts of the user-space state in registers. The RIP to return to after the call is stored in rcx, and the flags are stored in R11 (because RFLAGS is masked with a kernel-supplied value before entry to the kernel). This means that both those registers are clobbered by the instruction.

Since they are clobbered, the syscall ABI uses another register instead of rcx, hence the use of r10 for the 4th argument.

r10 is a natural choice, since in the x86-64 SystemV ABI it's not used for passing function args, and functions don't need to preserve their caller's value of r10. So a syscall wrapper function can mov %rcx, %r10 without any save/restore. This wouldn't be possible with any other register, for 6-arg syscalls and the SysV ABI's function calling convention.


BTW, the 32-bit system call ABI is also accessible with sysenter, which requires cooperation between user-space and kernel-space to allow returning to user-space after a sysenter. (i.e. storing some state in user-space before running sysenter). This is higher performance than int 0x80, but awkward. Still, glibc uses it (by jumping to user-space code in the vdso pages that the kernel maps into the address space of every process).

AMD's syscall is another approach to the same idea as Intel's sysenter: to make entry/exit from the kernel less expensive by not preserving absolutely everything.

AMD's syscall clobbers the rcx register, thus r10 is used instead.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!