Suppose we have the following (nonsensical) code:
const int a = 0;
int c = 0;
for(int b = 0; b < 10000000; b++)
{
if(a) c++;
c += 7;
}
Looking at the assembly generated by gcc5.2 -O2 shows that the optimization does not happen when using std::function
:
#include
int main()
{
const int a = 0;
std::function lambda = [a]()
{
int c = 0;
for(int b = 0; b < 10000000; b++)
{
if(a) c++;
c += 7;
}
return c;
};
return lambda();
}
compiles to some boilerplate and
movl (%rdi), %ecx
movl $10000000, %edx
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L3:
cmpl $1, %ecx
sbbl $-1, %eax
addl $7, %eax
subl $1, %edx
jne .L3
rep; ret
which is the loop you wanted to see optimized away. (Live) But if you actually use a lambda (and not an std::function
), the optimization does happen:
int main()
{
const int a = 0;
auto lambda = [a]()
{
int c = 0;
for(int b = 0; b < 10000000; b++)
{
if(a) c++;
c += 7;
}
return c;
};
return lambda();
}
compiles to
movl $70000000, %eax
ret
i.e. the loop was removed completely. (Live)
Afaik, you can expect a lambda to have zero overhead, but std::function
is different and comes with a cost (at least at the current state of the optimizers, although people apparently work on this), even if the code "inside the std::function
" would have been optimized. (Take that with a grain of salt and try if in doubt, since this will probably vary between compilers and versions. std::function
s overhead can certainly be optimized away.)
As @MarcGlisse correctly pointed out, clang3.6 performs the desired optimization (equivalent to the second case above) even with std::function
. (Live)
Bonus edit, thanks to @MarkGlisse again: If the function that contains the std::function
is not called main
, the optimization happening with gcc5.2 is somewhere between gcc+main and clang, i.e. the function gets reduced to return 70000000;
plus some extra code. (Live)
Bonus edit 2, this time mine: If you use -O3, gcc will, (for some reason) as explained in Marco's answer, optimize the std::function
to
cmpl $1, (%rdi)
sbbl %eax, %eax
andl $-10000000, %eax
addl $80000000, %eax
ret
and keep the rest as in the not_main
case. So I guess at the bottom of the line, one will just have to measure when using std::function
.