passing rvalue to non-ref parameter, why can't the compiler elide the copy?

爷,独闯天下 提交于 2019-12-05 14:38:31

You're right; this looks like a missed-optimization by the compiler. You can report this bug (https://bugs.llvm.org/) if there isn't already a duplicate.

Contrary to popular belief, compilers often don't make optimal code. It's often good enough, and modern CPUs are quite good at plowing through excess instructions when they don't lengthen dependency chains too much, especially the critical path dependency chain if there is one.

x86-64 SysV passes large structs by value on the stack if they don't fit packed into two 64-bit integer registers, and them returns via hidden pointer. The compiler can and should (but doesn't) plan ahead and reuse the return value temporary as the stack-args for the call to foo(Big).


gcc7.3, ICC18, and MSVC CL19 also miss this optimization. :/ I put your code up on the Godbolt compiler explorer with gcc/clang/ICC/MSVC. gcc uses 4x push qword [rsp+24] to copy, while ICC uses extra instructions to align the stack by 32.

Using 1x 32-byte load/store instead of 2x 16-byte probably doesn't justify the cost of the vzeroupper for MSVC / ICC / clang, for a function this small. vzeroupper is cheap on mainstream Intel CPUs (only 4 uops), and I did use -march=haswell to tune for that, not for AMD or KNL where it's more expensive.


Related: x86-64 Windows passes large structs by hidden pointer, as well as returning them that way. The callee owns the pointed-to memory. (What happens at assembly level when you have functions with large inputs)

This optimization would still be available by simply reserving space for the temporary + shadow-space before the first call to getStuff(), and allowing the callee to destroy the temporary because we don't need it later.

That's not actually what MSVC does here or in related cases, though, unfortunately.

See also @BeeOnRope's answer, and my comments onit, on Why isn't pass struct by reference a common optimization?. Making sure the copy-constructor can always run at a sane place for non-trivially-copyable objects is problematic if you're trying to design a calling convention that avoids copying by passing by hidden const-reference (caller owns the memory, callee can copy if needed).

But this is an example of a case where non-const reference (callee owns the memory) is best, because the caller wants to hand off the object to the callee.

There's a potential gotcha, though: if there are any pointers to this object, letting the callee use it directly could introduce bugs. Consider some other function that does global_pointer->a[4]=0;. If our callee calls that function, it will unexpectedly modify our callee's by-value arg.

So letting the callee destroy our copy of the object in the Windows x64 calling convention only works if escape analysis can prove that nothing else has a pointer to this object.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!