OsDev syscall/sysret and sysenter/sysexit instructions enabling

拈花ヽ惹草 提交于 2019-11-29 08:59:21

See the OSdev wiki for details on sysenter, including a note about how to avoid a security/safety problem. Also see the Intel / AMD manuals for that. They go into a lot of the detail that OS developers need. See the tag wiki for links.


Overview of the various system-call instructions:

  • int: available since forever (8086)
  • Trap by executing an invalid instruction, apparently was the fastest way to enter the kernel on 80386. (But that's not the case anymore).
  • call gate (i.e. a far call). See the OSdev link for details on that and traps.
  • sysenter: (http://wiki.osdev.org/Sysenter) Introduced by Intel before x86-64 existed, adopted by AMD not long after (many years ago). Available on all modern x86 CPUs. Very minimalist design, requires user-space cooperation for the kernel to be able to return, because it doesn't save EIP, ESP, or EFLAGS anywhere.

    Linux supports it in 32 and 64-bit kernels for system calls from 32-bit processes only. IDK if you could design a kernel that used it for 64-bit system calls as well / instead. (I know that wasn't the question, but it's related.)

    Using sysenter requires user-space cooperation to provide the return address and save its own ESP and EFLAGS. In Linux, the kernel exports a page of code which has the user-space side of this dance. User-space is expected to call this code instead of using sysenter directly, but feel free to design your OS however you want. Looking at Linux's code for both sides of this dance will probably be useful, if you don't find an example somewhere else.

  • syscall from 64-bit user-space: available everywhere because Intel implemented it along with the rest of AMD64. Well-designed interface that masks RFLAGS (with a configurable mask) before entering the kernel, so you can avoid a race window (if you had to disable interrupts manually with cli). Used with swapgs for the kernel to get access to its stack and so on.

    On mainstream x86 OSes (like Linux), syscall is the only way to make 64-bit system calls.

  • syscall from 32-bit user-space: A totally different instruction from long mode syscall, only available on AMD CPUs. The kernel-side interface is different for 32-bit kernels (legacy mode) vs. 64-bit kernels running 32-bit user-space (compat mode).

    The Linux kernel has some useful comments on it:

entry_64_compat.S 32-bit SYSCALL entry (32-bit syscall entry point into a 64-bit kernel)

 /* ...
 *  - Most programmers do not directly target AMD CPUs, and the 32-bit
 *    SYSCALL instruction does not exist on Intel CPUs.  Even on AMD
 *    CPUs, Linux disables the SYSCALL instruction on 32-bit kernels
 *    because the SYSCALL instruction in legacy/native 32-bit mode (as
 *    opposed to compat mode) is sufficiently poorly designed as to be
 *    essentially unusable.

Maybe a toy OS could use it without worrying about whatever problems make it unsuitable for Linux, IDK. But unless you're just plain curious, don't waste your time with it. OTOH, if you're interested in OS & CPU design, finding out what's wrong with the ISA design might be interesting.

BTW, when AMD was designing AMD64, they got some feedback from Linux kernel devs on the amd64 mailing list that improved the design of 64-bit syscall (to configurably mask RFLAGS) because their initial design would have been problematic for Linux. Links to those archived mailing list posts in this answer.


Recommendation: Use sysenter for your 32-bit kernel. It should be usable everywhere, including on AMD CPUs for years now. Ancient CPUs that don't support it can use the int 0x80 ABI (or whatever number you picked for your OS), if you want to add a 2nd compatibility ABI.

The Linux kernel entry points are well commented, and written fairly readably. While writing What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?, I had an easy time figuring out what was going on in the entry points into a 64-bit kernel using syscall (native 64-bit system calls), or int 0x80 or sysenter (32-bit system calls, normally from compat mode but int 0x80 is supported for 64-bit processes. But it still invokes the 32-bit ABI!) There's a bunch of complicated stuff going on in case various kinds of tracing / debugging are enabled, but the other parts are fairly easy to follow. See that answer for a walk-through of some of Linux's system-call handling internals.

In arch/x86/entry, these are the main files of interest:

  • entry_32.S: 32-bit kernel code for entry from user-space. (legacy mode)
  • entry_64_compat.S: 64-bit kernel code for entry from 32-bit user-space (compat mode -> long mode).
  • entry_64.S: 64-bit kernel code for entry from 64-bit user-space (long mode -> long mode).

You should be able to find Linux's VDSO code for the user-space side of the sysenter dance that passes the kernel the values it needs to return to user-space. (What is better "int 0x80" or "syscall"?). Related: What is better "int 0x80" or "syscall"?, and The Definitive Guide to Linux System Calls will give some useful info on the design choices Linux made.


Is true that sysret instruction isn't safe?

Intel and AMD both have separate bugs with non-canonical RIP when returning to 64-bit user-space. e.g. on Intel, Linux's entry_64.S describes it this way:

/*
 * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
 * in kernel space.  This essentially lets the user take over
 * the kernel, since userspace controls RSP.

That can happen if a ptrace system call (e.g. made by a debugger) changed the saved value of the process's RIP to a non-canonical address. Linux checks whether it can use sysret, and if not uses its iret return path. (The sysret path is fast enough that it's worth doing extra work to check that it's safe).

Note that if a system call blocks / sleeps, the "master copy" of user-space's integer register state is on its kernel stack, where the system call entry point pushed it. (In Linux. Other designs are possible!) But anyway, this is why it's possible to end up with weird saved state that user-space couldn't have run syscall with (because it would have faulted on jmp to a non-canonical address), or with saved_rcx != saved_RIP (64-bit syscall sets RCX=RIP, and R11=RFLAGS (before masking), so it clobbers RCX and R11 but allows the kernel to restore RIP and RFLAGS.)

I don't know how 32-bit syscall works, sorry I got off topic here. But I suspect that what you may have read about sysret being unsafe was talking about 64-bit kernels.

IDK if there are any similar bugs in 32-bit-kernel sysret, or 64-bit-kernel sysret-to-compat-mode.

syscall cannot be used on x86, only on x86_64 (portably at least). That being said, on x86_64, the instructions are enabled by loading the correct CS selectors for user-mode and kernel-mode into the IA32_STAR model-specific register, and then the address of whatever you want to call when syscall is executed into IA32_LSTAR. You also need to handle the execution context of these instructions carefully, as they clobber some registers etc.

I suggest reading up in the manuals - both the Intel manual itself and Volume 2 of the AMD64 manual are good places to start.

Is true that syscall instruction isn't supported in 32 bit by Intel processors so I can't use it?

At least Wikipedia says this.

And more important: syscall seems not even to be supported by any 32-bit CPU (even not AMD) but only in 32-bit mode of 64-bit AMD CPUs.

I am building an 32 bit OS in assembly.

So why do you want to use syscall or sysenter?

Nearly all 32-bit x86 OSs use either interrupts (e.g. Linux) or call gates (e.g. Solaris) to enter the kernel...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!