Can I persude GCC to inline a deferred call through a stored function pointer?

纵然是瞬间 提交于 2019-12-12 14:05:28

问题


Naturally, C++ compilers can inline function calls made from within a function template, when the inner function call is directly known in that scope (ref).

#include <iostream>

void holyheck()
{
   std::cout << "!\n";
}

template <typename F>
void bar(F foo)
{
   foo();
}

int main()
{
   bar(holyheck);
}

Now what if I'm passing holyheck into a class, which stores the function pointer (or equivalent) and later invokes it? Do I have any hope of getting this inlined? How?

template <typename F>
struct Foo
{
   Foo(F f) : f(f) {};
   void calledLater() { f(); }

private:
   F f;
};

void sendMonkeys();
void sendTissues();

int main()
{
   Foo<void(*)()> f(sendMonkeys);
   Foo<void(*)()> g(sendTissues);
   // lots of interaction with f and g, not shown here
   f.calledLater();
   g.calledLater();
}

My type Foo is intended to isolate a ton of logic; it will be instantiated a few times. The specific function invoked from calledLater is the only thing that differs between instantiations (though it never changes during the lifetime of a Foo), so half of the purpose of Foo is to abide by DRY. (The rest of its purpose is to keep this mechanism isolated from other code.)

But I don't want to introduce the overhead of an actual additional function call in doing so, because this is all taking place in a program bottleneck.

I don't speak ASM so analysing the compiled code isn't much use to me.
My instinct is that I have no chance of inlining here.


回答1:


If you don't really need to use a function pointer, then a functor should make the optimisation trivial:

struct CallSendMonkeys {
  void operator()() {
    sendMonkeys();
  }
};
struct CallSendTissues {
  void operator()() {
    sendTissues();
  }
};

(Of course, C++11 has lambdas, but you tagged your question C++03.)

By having different instantiations of Foo with these classes, and having no internal state in these classes, f() does not depend on how f was constructed, so it's not a problem if a compiler can't tell that it remains unmodified.




回答2:


With your example, that after fiddling to make it compile looks like this:

template <typename F>
struct Foo
{
   Foo(F f) : f(f) {};
   void calledLater() { f(); }

private:
   F f;
};

void sendMonkeys();
void sendTissues();

int main()
{
    Foo<__typeof__(&sendMonkeys)> f(sendMonkeys);
    Foo<__typeof__(&sendTissues)> g(sendTissues);
   // lots of interaction with f and g, not shown here
   f.calledLater();
   g.calledLater();
}

clang++ (3.7 as of a few weeks back which means I'd expect clang++3.6 to do this, as it's only a few weeks older in source-base) generates this code:

    .text
    .file   "calls.cpp"
    .globl  main
    .align  16, 0x90
    .type   main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    pushq   %rax
.Ltmp0:
    .cfi_def_cfa_offset 16
    callq   _Z11sendMonkeysv
    callq   _Z11sendTissuesv
    xorl    %eax, %eax
    popq    %rdx
    retq
.Ltmp1:
    .size   main, .Ltmp1-main
    .cfi_endproc

Of course, without a definition of sendMonkeys and sendTissues, we can't really inline any further.

If we implement them like this:

void request(const char *);
void sendMonkeys() { request("monkeys"); }
void sendTissues() { request("tissues"); }

the assembler code becomes:

main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    pushq   %rax
.Ltmp2:
    .cfi_def_cfa_offset 16
    movl    $.L.str, %edi
    callq   _Z7requestPKc
    movl    $.L.str1, %edi
    callq   _Z7requestPKc
    xorl    %eax, %eax
    popq    %rdx
    retq

.L.str:
    .asciz  "monkeys"
    .size   .L.str, 8

    .type   .L.str1,@object         # @.str1
.L.str1:
    .asciz  "tissues"
    .size   .L.str1, 8

Which, if you can't read assembler code is request("tissues") and request("monkeys") inlined as per expected.

I'm simply amazed that g++ 4.9.2. doesn't do the same thing (I got this far and expected to continue with "and g++ does the same, I'm not going to post the code for it"). [It does inline sendTissues and sendMonkeys, but doesn't go the next step to inline request as well]

Of course, it's entirely possible to make tiny changes to this and NOT get the code inlined - such as adding some conditions that depend on variables that the compiler can't determine at compile-time.

Edit: I did add a string and an integer to Foo and updated these with an external function, at which point the inlining went away for both clang and gcc. Using JUST an integer and calling an external function, it does inline the code.

In other words, it really depends on what the code is in the section // lots of interaction with f and g, not shown here. And I think you (Lightness) have been around here long enough to know that for 80%+ of the questions, it's the code that isn't posted in the question that is the most important part for the actual answer ;)




回答3:


To make your original approach work, use

template< void(&Func)() >
struct Foo
{
    void calledLater() { Func(); }
};

In general, I've had better luck getting gcc to inline things by using function references rather than function pointers.



来源:https://stackoverflow.com/questions/28820978/can-i-persude-gcc-to-inline-a-deferred-call-through-a-stored-function-pointer

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!