Link-time optimization and inline

前端 未结 6 863
情话喂你
情话喂你 2020-12-08 16:12

In my experience, there\'s lot of code that explicitly uses inline functions, which comes at a tradeoff:

  1. The code becomes less succinct and somewhat less maint
相关标签:
6条回答
  • 2020-12-08 16:45

    Even with LTO, a compiler still has to use heuristics to determine whether or not to inline a function for every call (note it makes the decision not per function, but per call). The heuristic takes into account factors like - is it in a loop, is the loop unrolled, how big the function is, how frequently it is called globally, etc. The compiler will certainly never be able to accurately determine how frequently code is called, and whether or not the code expansion is likely to blow out the instruction/trace/loop/microcode caches of a particular CPU at compile time.

    Profile Guided Optimization is supposed to be a step towards addressing this, but if you've ever tried it, you are likely to have noticed that you can get a swing in performance in the order of 0-2%, and it can be in either direction! :-) It's still a work in progress.

    If performance is your ultimate goal, and you really know what you are doing, and really do a thorough analysis of your code, what one really needs is a way to tell the compiler to inline or not inline on a per-call basis, not a per-function hint. In practice I have managed this by using compiler specific "force_no_inline" type hints for cases I don't want inlining, and a separate "force_inline" copy (or macro in the rare case this fails) of the function for when I want it inlined. If anyone knows how to do this in a cleaner way with compiler specific hints (for any C/C++ compilers), please let me know.

    To specifically address your points:

    1.The code becomes less succinct and somewhat less maintainable.

    Generally, no - it's just a keyword hint that controls how it is inlined. However if you jump through hoops like I described in the last paragraph, then yes.

    2.Sometimes, inlining can greatly increase run-time performance.

    When leaving the compiler to its own devices - yes, it certainly can, but mostly doesn't. The compiler has good heuristics that make good although not always optimal inlining decisions. Specificially for the keyword, compilers may totally ignore the keyword, or use to keyword as a weak hint - in general they do seem adverse to inlining code that red flags their heuristics (like inlining a 16k function into a loop unrolled 16x).

    3.Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.

    Yes, it uses static analysis. Dynamic analysis can come from your insight and you manually controlling inlining on a per-call basis, or theoretically from PGO (which still sucks).

    0 讨论(0)
  • 2020-12-08 16:51
    1. I don't think the inline keyword affects maintainability, and only barely the succinctness. (opinion)
    2. Sometimes inline can decrease run-time performance : http://www.parashift.com/c++-faq-lite/inline-functions.html#faq-9.3
    3. The compilers are quite smart about inlining, I've heard that Visual Studio ignores them almost completely and decides inlining itself.

    does link-time optimization render manual inlining, obsolete? Not at all, the optimizer that makes the inline keyword nigh-obsolete kicks in way before link-time.

    0 讨论(0)
  • 2020-12-08 16:54

    GCC 9 Binutils 2.33 experiment to show that LTO can inline

    For those that are curious if ld inlines across object files or not, here is a quick experiment that confirms that it can:

    main.c

    int notmain(void);
    
    int main(void) {
        return notmain();
    }
    

    notmain.c

    int notmain(void) {
        return 42;
    }
    

    Compile with LTO and disassemble:

    gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o main.o main.c
    gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o notmain.o notmain.c
    gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out notmain.o main.o
    gdb -batch -ex "disassemble/rs main" main.out
    

    Output:

       0x0000000000001040 <+0>:     b8 2a 00 00 00  mov    $0x2a,%eax
       0x0000000000001045 <+5>:     c3      retq 
    

    So yes, no callq, inlined.

    Without -flto:

       0x0000000000001040 <+0>:     f3 0f 1e fa     endbr64 
       0x0000000000001044 <+4>:     e9 f7 00 00 00  jmpq   0x1140 <notmain>
    

    So yes, if you are using -flto, you don't need to worry about putting definitions in headers so they can be inlined.

    The main downside of having definitions in headers is that they may slow down compilation. For C++ templates, you may also be interested in explicit template instantiation: Explicit template instantiation - when is it used?

    Tested in Ubuntu 19.10.

    0 讨论(0)
  • 2020-12-08 17:01

    Item 33 - Scott Myers - 2nd Ed - Effective C++ springs to mind.

    You must bear in mind the keyword static wrt inline! Now there is a hornets nest!

    0 讨论(0)
  • 2020-12-08 17:02

    The question is: does link-time optimization (e.g., in GCC) render manual inlining, e.g., declaring in C99 a function "inline" and providing an implementation, obsolete?

    This article would seem to answer "Yes:"

    Think for a minute: what turns a function into a good candidate for inlining? Apart from the size factor, the optimizer needs to know how often this function is called, where it is called from, how many other functions in the program are viable candidates for inlining and -- believe it or not -- whether the function is ever called. Optimizing (i.e. inlining) a function that isn't called even once is a waste of time and resources. But how can an optimizer know that a function is never called? Well, it cannot. Unless it has scanned the entire program. This is where [link-time optimization] becomes crucial.

    0 讨论(0)
  • 2020-12-08 17:03

    If link time optimization were as fast as compile time optimization, then it would obviate the need for compiler hints. Unfortunately, it is generally not faster than compile time optimization, so it's a tradeoff between overall build speed and the overall quality of optimizations for that build.

    Also, you still need to use inline when defining functions in headers. Otherwise, you will get linker errors for multiple definitions of those functions if they are used in multiple translation units.

    0 讨论(0)
提交回复
热议问题