Inlining of vararg functions

前端未结

关注

 4  1956

不知归路 2021-01-17 08:38

While playing about with optimisation settings, I noticed an interesting phenomenon: functions taking a variable number of arguments (...) never seemed to get i

4条回答

傲寒 (楼主)

2021-01-17 08:52

At least on x86-64, the passing of var_args is quite complex (due to passing arguments in registers). Other architectures may not be quite so complex, but it is rarely trivial. In particular, having a stack-frame or frame pointer to refer to when getting each argument may be required. These sort of rules may well stop the compiler from inlining the function.

The code for x86-64 includes pushing all the integer arguments, and 8 sse registers onto the stack.

This is the function from the original code compiled with Clang:

test:                                   # @test
    subq    $200, %rsp
    testb   %al, %al
    je  .LBB1_2
# BB#1:                                 # %entry
    movaps  %xmm0, 48(%rsp)
    movaps  %xmm1, 64(%rsp)
    movaps  %xmm2, 80(%rsp)
    movaps  %xmm3, 96(%rsp)
    movaps  %xmm4, 112(%rsp)
    movaps  %xmm5, 128(%rsp)
    movaps  %xmm6, 144(%rsp)
    movaps  %xmm7, 160(%rsp)
.LBB1_2:                                # %entry
    movq    %r9, 40(%rsp)
    movq    %r8, 32(%rsp)
    movq    %rcx, 24(%rsp)
    movq    %rdx, 16(%rsp)
    movq    %rsi, 8(%rsp)
    leaq    (%rsp), %rax
    movq    %rax, 192(%rsp)
    leaq    208(%rsp), %rax
    movq    %rax, 184(%rsp)
    movl    $48, 180(%rsp)
    movl    $8, 176(%rsp)
    movq    stdout(%rip), %rdi
    leaq    176(%rsp), %rdx
    movl    $.L.str, %esi
    callq   vfprintf
    addq    $200, %rsp
    retq

and from gcc:

test.constprop.0:
    .cfi_startproc
    subq    $216, %rsp
    .cfi_def_cfa_offset 224
    testb   %al, %al
    movq    %rsi, 40(%rsp)
    movq    %rdx, 48(%rsp)
    movq    %rcx, 56(%rsp)
    movq    %r8, 64(%rsp)
    movq    %r9, 72(%rsp)
    je  .L2
    movaps  %xmm0, 80(%rsp)
    movaps  %xmm1, 96(%rsp)
    movaps  %xmm2, 112(%rsp)
    movaps  %xmm3, 128(%rsp)
    movaps  %xmm4, 144(%rsp)
    movaps  %xmm5, 160(%rsp)
    movaps  %xmm6, 176(%rsp)
    movaps  %xmm7, 192(%rsp)
.L2:
    leaq    224(%rsp), %rax
    leaq    8(%rsp), %rdx
    movl    $.LC0, %esi
    movq    stdout(%rip), %rdi
    movq    %rax, 16(%rsp)
    leaq    32(%rsp), %rax
    movl    $8, 8(%rsp)
    movl    $48, 12(%rsp)
    movq    %rax, 24(%rsp)
    call    vfprintf
    addq    $216, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc

In clang for x86, it is much simpler:

test:                                   # @test
    subl    $28, %esp
    leal    36(%esp), %eax
    movl    %eax, 24(%esp)
    movl    stdout, %ecx
    movl    %eax, 8(%esp)
    movl    %ecx, (%esp)
    movl    $.L.str, 4(%esp)
    calll   vfprintf
    addl    $28, %esp
    retl

There's nothing really stopping any of the above code from being inlined as such, so it would appear that it is simply a policy decision on the compiler writer. Of course, for a call to something like printf, it's pretty meaningless to optimise away a call/return pair for the cost of the code expansion - after all, printf is NOT a small short function.

(A decent part of my work for most of the past year has been to implement printf in an OpenCL environment, so I know far more than most people will ever even look up about format specifiers and various other tricky parts of printf)

Edit: The OpenCL compiler we use WILL inline calls to var_args functions, so it is possible to implement such a thing. It won't do it for calls to printf, because it bloats the code very much, but by default, our compiler inlines EVERYTHING, all the time, no matter what it is... And it does work, but we found that having 2-3 copies of printf in the code makes it REALLY huge (with all sorts of other drawbacks, including final code generation taking a lot longer due to some bad choices of algorithms in the compiler backend), so we had to add code to STOP the compiler doing that...

0 讨论(0)

查看其它4个回答