x86_64: Is it possible to “in-line substitute” PLT/GOT references?

前端 未结 3 1783
感情败类
感情败类 2020-12-16 03:16

I\'m not sure what a good subject line for this question is, but here we go ...

In order to force code locality / compactness for a critical section of code, I\'m lo

相关标签:
3条回答
  • 2020-12-16 04:01

    You can statically link the executable. Just add -static to the final link command, and all you indirect jumps will be replaced by direct calls.

    0 讨论(0)
  • 2020-12-16 04:10

    In order to inline the call you would need a code (.text) relocation whose result is the final address of the function in the dynamically loaded shared library. No such relocation exists (and modern static linkers don't allow them) on x86_64 using a GNU toolchain for GNU/Linux, therefore you cannot inline the entire call as you wish to do.

    The closest you can get is a direct call through the GOT (avoids PLT):

        .section    .rodata
    .LC0:
        .string "Hello, World!\n"
        .text
        .globl  main
        .type   main, @function
    main:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    $.LC0, %eax
        movq    %rax, %rdi
        call    *printf@GOTPCREL(%rip)
        nop
        popq    %rbp
        ret
        .size   main, .-main
    

    This should generate a R_X86_64_GLOB_DAT relocation against printf in the GOT to be used by the sequence above. You need to avoid C code because in general the compiler may use any number of caller-saved registers in the prologue and epilogue, and this forces you to save and restore all such registers around the asm function call or risk corrupting those registers for later use in the wrapper function. Therefore it is easier to write the wrapper in pure assembly.

    Another option is to compile with -Wl,-z,now -Wl,-z,relro which ensures the PLT and PLT-related GOT entries are resolved at startup to increase code locality and compactness. With full RELRO you'll only have to run code in the PLT and access data in the GOT, two things which should already be somewhere in the cache hierarchy of the logical core. If full RELRO is enough to meet your needs then you wouldn't need wrappers and you would have added security benefits.

    The best options are really static linking or LTO if they are available to you.

    0 讨论(0)
  • 2020-12-16 04:15

    This optimization has since been implemented in GCC. It can be enabled with the -fno-plt option and the noplt function attribute:

    Do not use the PLT for external function calls in position-independent code. Instead, load the callee address at call sites from the GOT and branch to it. This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations. On architectures such as 32-bit x86 where PLT stubs expect the GOT pointer in a specific register, this gives more register allocation freedom to the compiler. Lazy binding requires use of the PLT; with -fno-plt all external symbols are resolved at load time.

    Alternatively, the function attribute noplt can be used to avoid calls through the PLT for specific external functions.

    In position-dependent code, a few targets also convert calls to functions that are marked to not use the PLT to use the GOT instead.

    0 讨论(0)
提交回复
热议问题