Linux syscall, libc, VDSO and implementation dissection

前端 未结 1 1882
半阙折子戏
半阙折子戏 2021-01-12 09:41

I dissects the syscall call in the last libc:

git clone git://sourceware.org/git/glibc.git

And I have this code in sysdeps/unix/sysv/linux/

相关标签:
1条回答
  • 2021-01-12 10:37

    The macros involved in glibc's syscalls will expand to something like the following, for the example of the exit syscall.

    LOADREGS_1(args)
    asm volatile (
    "call *%%gs:%P2"
    : "=a" (resultvar)
    : "a" (__NR_exit), "i" (offsetof (tcbhead_t, sysinfo))
      ASMARGS_1(args) : "memory", "cc")
    

    LOADREGS_1(args) will expand to LOADREGS_0(), which will expand to nothing - LOADREGS_*(...) only need to adjust registers when more parameters are provided.

    ASMARGS_1(args) will expand to ASMARGS_0 (), "b" ((unsigned int) (arg1)), which will expand to , "b" ((unsigned int) (arg1).

    __NR_exit is 1 on x86.

    As such, the code will expand to something like:

    asm volatile (
    "call *%%gs:%P2"
    : "=a" (resultvar)
    : "a" (1), "i" (offsetof (tcbhead_t, sysinfo))
    , "b" ((unsigned int) (arg1) : "memory", "cc")
    

    ASMARGS_* don't actually execute code per se - they're instructions to gcc to make sure that certain values (such as (unsigned int) (arg1)) are in certain registers (such as b, aka ebx). As such, the combination of parameters to asm volatile (which isn't a function, of course, but just a gcc builtin) simply specify how gcc should prepare for the syscall and how it should continue after the syscall completes.

    Now, the generated assembly will look something like this:

    ; set up other registers...
    movl $1, %eax
    call *%gs:0x10
    ; tear down
    

    %gs is a segment register that references thread-local storage - specifically, glibc is referencing a saved value that points to the VDSO, which it stored there when it first parsed the ELF headers that told it where the VDSO was at.

    Once the code enters the VDSO, we don't know exactly what happens - it varies depending on the kernel version - but we do know that it uses the most efficient available mechanism to run a syscall, such as the sysenter instruction or the int 0x80 instruction.

    So, yes, your diagram is accurate:

    write(1, "A", 1)  ----->   LIBC   ----->   VDSO   -----> KERNEL
                              load reg           ?   
                            jump to vdso 
    |---------------------------------------------------|--------------|
           user land                                       kernel land
    

    Here's a simpler example of code to call into the VDSO, specifically for one-parameter syscalls, from a library that I maintain called libsyscall:

    _lsc_syscall1:
        xchgl 8(%esp), %ebx
        movl 4(%esp), %eax
        call *_lsc_vdso_ptr(,1)
        movl 8(%esp), %ebx
        # pass %eax out
        ret
    

    This simply moves parameters from the stack into registers, calls into the VDSO via a pointer loaded from memory, restores the other registers to their previous state, and returns the result of the syscall.

    0 讨论(0)
提交回复
热议问题