When a C++ lambda expression has a lot of captures by reference, the size of the unnamed function object becomes large

混江龙づ霸主 提交于 2019-11-28 13:41:55

I do not understand why you seem surprised.

The C++ Standard gives a set of requirements, and every single implementation is free to pick any strategy that meets the requirements.

Why would an implementation optimize the size of the lambda object ?

Specifically, do you realize how that would tie down the generated code of this lambda to the generated code for the surrounding function ?

It's easy to say Hey! This could be optimized!, but it's much more difficult to actually optimize and make sure it works in all edge cases. So, personally, I much prefer having a simple and working implementation than a botched attempt at optimizing it...

... especially when the work-around is so easy:

struct S { int a, b, c, d, e, f, g; };

int main() {
    S s = {};
    auto func = [&](){
        std::cout << s.a << s.b << s.c << s.d << s.e << s.f << s.g << "\n";
    };
    std::cout << sizeof(func) << "\n";
    return 0;
}

Look Ma: 4 bytes only!

It is legal for a compiler to capture by reference via stack pointer. There is a slight downside (in that offsets have to be added to said stack pointer).

Under the current C++ standard with defects included, you also have to capture reference variables by pseudo-pointer, as the lifetime of the binding must last as long as the referred-to-data, not the reference it directly binds to.

The simpler implementation, where each captured variable corresponds to a constructor argument and class member variable, has the serious advantage that it lines up with "more normal" C++ code. Some work for magic this need be done, but other than that the lambda closure is a bog-standard object instance with an inline operator(). Optimization strategies on "more normal" C++ code will work, bugs are going to be mostly in common with "more normal" code, etc.

Had the compiler writers gone with the stack-frame implementation, probably reference capture of references in that implementation would have failed to work like it did in every other compiler. When the defect was resolved (in favor of it working), the code would have to be changed again. In essence, the compilers that would have used a simpler implementation would almost certainly have had fewer bugs and more working code than those who used a fancy implementation.

With the stack-frame capture, all optimization for a lambda would have to be customized for that lambda. It would be equivalent to a class that captured a void*, does pointer arithmetic on it, and casts the resulting data to typed pointers. That is going to be extremely hard to optimize, as pointer arithmetic tends to block optimization, especially pointer arithmetic between stack variables (which is usually undefined). What is worse is that such pointer arithmetic means that the optimization of stack variable state (eliminating variables, overlapping lifetime, registers) now has to interact with the optimization of lambdas in entangled ways.

Working on such an optimization would be a good thing. As a bonus, because lambda types are tied to compilation units, messing with the implementation of a lambda will not break binary compatibility between compilation units. So you can do such changes relatively safely, once they are a proven stable improvement. However, if you do implement that optimization, you really really will want the ability to revert to the simpler proven one.

I encourage you to provide patches to your favorite open-source compiler to add this functionality.

Because that's how it's implemented. I don't know if the standard says anything about how it should be implemented but I guess it's implementation defined how big a lambda object will be in that situation.

There would be nothing wrong for a compiler to store a single pointer and use the offsets, to do what you suggest, as an optimization. Perhaps some compilers do that, I don't know.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!