Do C++ compilers perform compile-time optimizations on lambda closures?

后端 未结 2 1838
粉色の甜心
粉色の甜心 2021-02-20 00:01

Suppose we have the following (nonsensical) code:

const int a = 0;
int c = 0;
for(int b = 0; b < 10000000; b++)
{
    if(a) c++;
    c += 7;
}
相关标签:
2条回答
  • 2021-02-20 00:51

    Both gcc at -O3 and MSVC2015 Release won't optimize it away with this simple code and the lambda would actually be called

    #include <functional>
    #include <iostream>
    
    int main()
    {
        int a = 0;    
        std::function<int()> lambda = [a]()
        {
            int c = 0;
            for(int b = 0; b < 10; b++)
            {
                if(a) c++;
                c += 7;
            }
            return c;
        };
    
        std::cout << lambda();
    
        return 0;
    }
    

    At -O3 this is what gcc generates for the lambda (code from godbolt)

    lambda:
        cmp DWORD PTR [rdi], 1
        sbb eax, eax
        and eax, -10
        add eax, 80
        ret
    

    This is a contrived and optimized way to express the following:

    • If a was a 0, the first comparison would set the carry flag CR. eax would actually be set to 32 1 values, and'ed with -10 (and that would yield -10 in eax) and then added 80 -> result is 70.

    • If a was something different from 0, the first comparison would not set the carry flag CR, eax would be set to zero, the and would have no effect and it would be added 80 -> result is 80.

    It has to be noted (thanks Marc Glisse) that if the function is marked as cold (i.e. unlikely to be called) gcc performs the right thing and optimizes the call away.

    MSVC generates more verbose code but the comparison isn't skipped.

    Clang is the only one which gets it right: the lambda hasn't its code optimized more than gcc did but it is not called

    mov edi, std::cout
    mov esi, 70
    call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
    

    Morale: Clang seems to get it right but the optimization challenge is still open.

    0 讨论(0)
  • 2021-02-20 00:52

    Looking at the assembly generated by gcc5.2 -O2 shows that the optimization does not happen when using std::function:

    #include <functional>
    
    int main()
    {
        const int a = 0;    
        std::function<int()> lambda = [a]()
        {
            int c = 0;
            for(int b = 0; b < 10000000; b++)
            {
                if(a) c++;
                c += 7;
            }
            return c;
        };
    
        return lambda();
    }
    

    compiles to some boilerplate and

        movl    (%rdi), %ecx
        movl    $10000000, %edx
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
    .L3:
        cmpl    $1, %ecx
        sbbl    $-1, %eax
        addl    $7, %eax
        subl    $1, %edx
        jne .L3
        rep; ret
    

    which is the loop you wanted to see optimized away. (Live) But if you actually use a lambda (and not an std::function), the optimization does happen:

    int main()
    {
        const int a = 0;    
        auto lambda = [a]()
        {
            int c = 0;
            for(int b = 0; b < 10000000; b++)
            {
                if(a) c++;
                c += 7;
            }
            return c;
        };
    
        return lambda();
    }
    

    compiles to

    movl    $70000000, %eax
    ret
    

    i.e. the loop was removed completely. (Live)

    Afaik, you can expect a lambda to have zero overhead, but std::function is different and comes with a cost (at least at the current state of the optimizers, although people apparently work on this), even if the code "inside the std::function" would have been optimized. (Take that with a grain of salt and try if in doubt, since this will probably vary between compilers and versions. std::functions overhead can certainly be optimized away.)

    As @MarcGlisse correctly pointed out, clang3.6 performs the desired optimization (equivalent to the second case above) even with std::function. (Live)

    Bonus edit, thanks to @MarkGlisse again: If the function that contains the std::function is not called main, the optimization happening with gcc5.2 is somewhere between gcc+main and clang, i.e. the function gets reduced to return 70000000; plus some extra code. (Live)

    Bonus edit 2, this time mine: If you use -O3, gcc will, (for some reason) as explained in Marco's answer, optimize the std::function to

    cmpl    $1, (%rdi)
    sbbl    %eax, %eax
    andl    $-10000000, %eax
    addl    $80000000, %eax
    ret
    

    and keep the rest as in the not_main case. So I guess at the bottom of the line, one will just have to measure when using std::function.

    0 讨论(0)
提交回复
热议问题