问题
Naturally, C++ compilers can inline function calls made from within a function template, when the inner function call is directly known in that scope (ref).
#include <iostream>
void holyheck()
{
std::cout << "!\n";
}
template <typename F>
void bar(F foo)
{
foo();
}
int main()
{
bar(holyheck);
}
Now what if I'm passing holyheck
into a class, which stores the function pointer (or equivalent) and later invokes it? Do I have any hope of getting this inlined? How?
template <typename F>
struct Foo
{
Foo(F f) : f(f) {};
void calledLater() { f(); }
private:
F f;
};
void sendMonkeys();
void sendTissues();
int main()
{
Foo<void(*)()> f(sendMonkeys);
Foo<void(*)()> g(sendTissues);
// lots of interaction with f and g, not shown here
f.calledLater();
g.calledLater();
}
My type Foo
is intended to isolate a ton of logic; it will be instantiated a few times. The specific function invoked from calledLater
is the only thing that differs between instantiations (though it never changes during the lifetime of a Foo
), so half of the purpose of Foo
is to abide by DRY. (The rest of its purpose is to keep this mechanism isolated from other code.)
But I don't want to introduce the overhead of an actual additional function call in doing so, because this is all taking place in a program bottleneck.
I don't speak ASM so analysing the compiled code isn't much use to me.
My instinct is that I have no chance of inlining here.
回答1:
If you don't really need to use a function pointer, then a functor should make the optimisation trivial:
struct CallSendMonkeys {
void operator()() {
sendMonkeys();
}
};
struct CallSendTissues {
void operator()() {
sendTissues();
}
};
(Of course, C++11 has lambdas, but you tagged your question C++03.)
By having different instantiations of Foo
with these classes, and having no internal state in these classes, f()
does not depend on how f
was constructed, so it's not a problem if a compiler can't tell that it remains unmodified.
回答2:
With your example, that after fiddling to make it compile looks like this:
template <typename F>
struct Foo
{
Foo(F f) : f(f) {};
void calledLater() { f(); }
private:
F f;
};
void sendMonkeys();
void sendTissues();
int main()
{
Foo<__typeof__(&sendMonkeys)> f(sendMonkeys);
Foo<__typeof__(&sendTissues)> g(sendTissues);
// lots of interaction with f and g, not shown here
f.calledLater();
g.calledLater();
}
clang++ (3.7 as of a few weeks back which means I'd expect clang++3.6 to do this, as it's only a few weeks older in source-base) generates this code:
.text
.file "calls.cpp"
.globl main
.align 16, 0x90
.type main,@function
main: # @main
.cfi_startproc
# BB#0: # %entry
pushq %rax
.Ltmp0:
.cfi_def_cfa_offset 16
callq _Z11sendMonkeysv
callq _Z11sendTissuesv
xorl %eax, %eax
popq %rdx
retq
.Ltmp1:
.size main, .Ltmp1-main
.cfi_endproc
Of course, without a definition of sendMonkeys and sendTissues, we can't really inline any further.
If we implement them like this:
void request(const char *);
void sendMonkeys() { request("monkeys"); }
void sendTissues() { request("tissues"); }
the assembler code becomes:
main: # @main
.cfi_startproc
# BB#0: # %entry
pushq %rax
.Ltmp2:
.cfi_def_cfa_offset 16
movl $.L.str, %edi
callq _Z7requestPKc
movl $.L.str1, %edi
callq _Z7requestPKc
xorl %eax, %eax
popq %rdx
retq
.L.str:
.asciz "monkeys"
.size .L.str, 8
.type .L.str1,@object # @.str1
.L.str1:
.asciz "tissues"
.size .L.str1, 8
Which, if you can't read assembler code is request("tissues")
and request("monkeys")
inlined as per expected.
I'm simply amazed that g++ 4.9.2. doesn't do the same thing (I got this far and expected to continue with "and g++ does the same, I'm not going to post the code for it"). [It does inline sendTissues
and sendMonkeys
, but doesn't go the next step to inline request
as well]
Of course, it's entirely possible to make tiny changes to this and NOT get the code inlined - such as adding some conditions that depend on variables that the compiler can't determine at compile-time.
Edit:
I did add a string and an integer to Foo
and updated these with an external function, at which point the inlining went away for both clang and gcc. Using JUST an integer and calling an external function, it does inline the code.
In other words, it really depends on what the code is in the section
// lots of interaction with f and g, not shown here
. And I think you (Lightness) have been around here long enough to know that for 80%+ of the questions, it's the code that isn't posted in the question that is the most important part for the actual answer ;)
回答3:
To make your original approach work, use
template< void(&Func)() >
struct Foo
{
void calledLater() { Func(); }
};
In general, I've had better luck getting gcc
to inline things by using function references rather than function pointers.
来源:https://stackoverflow.com/questions/28820978/can-i-persude-gcc-to-inline-a-deferred-call-through-a-stored-function-pointer