问题
I'm working on a C++ project with extensive compile-time computations. Long compilation time is slowing us down. How might I find out the slowest parts of our template meta-programs so I can optimize them? (When we have slow runtime computations, I have many profilers to choose from, e.g. valgrind's callgrind
tool. So I tried building a debug GCC and profiling it compiling our code, but I didn't learn much from that.)
I use GCC and Clang, but any suggestions are welcome.
I found profile_templates
on Boost's site, but it seems to be thinly documented and require the jam/bjam build system. If you show how to use it on a non-jam project1, I will upvote you. https://svn.boost.org/svn/boost/sandbox/tools/profile_templates/ appears to count number-of-instantiations, whereas counting time taken would be ideal.
1Our project uses CMake and is small enough that hacking together a Jamfile just for template profiling could be acceptable.
回答1:
I've been working since 2008 on a library that uses template metaprogramming heavily. There is a real need for better tools or approaches for understanding what consumes the most compile time.
The only technique I know of is a divide and conquer approach, either by separating code into different files, commenting out bodies of template definitions, or by wrapping your template instantiations in #define macros and temporarily redefining those macros to do nothing. Then you can recompile the project with and without various instantiations and narrow down.
Incidentally just separating the same code into more, smaller files may make it compile faster. I'm not just talking about opportunity for parallel compilation - even serially, I observed it to still be faster. I've observed this effect in gcc both when compiling my library, and when compiling Boost Spirit parsers. My theory is that some of the symbol resolution, overload resolution, SFINAE, or type inference code in gcc has an O(n log n) or even O(n^2) complexity with respect to the number of type definitions or symbols in play in the execution unit.
Ultimately what you need to do is carefully examine your templates and separate what really depends on the type information from what does not, and use type erasure and virtual functions whereever possible on the portion of the code that does not actually require the template types. You need to get stuff out of the headers and into cpp files if that part of the code can be moved. In a perfect world the compiler should be able to figure this out for itself - you shouldn't have to manually move this code to babysit it - but this is the state of the art with the compilers we have today.
回答2:
I know this is an old question, but there is a newer answer that I would like to give.
There is a clang-based set of projects that target this particular problem. The first component is an instrumentation onto the clang compiler which produces a complete trace of all the template instantiations that occurred during compilation, with timing values and optionally memory usage counts as well. That tool is called Templight, as is accessible here (currently needs to compile against a patched clang source tree):
https://github.com/mikael-s-persson/templight
A second component is a conversion tool that allows you to convert the templight traces into other formats, such as easily parsable text-based format (yaml, xml, text, etc.) and into formats that can more easily be visualized, such as graphviz / graphML, and more importantly a callgrind output that can be loaded into KCacheGrind to visualize and inspect the meta-call-graph of template instantiations and their compile-time costs, such as this screenshot of a template instantiation profile of a piece of code that creates a boost::container::vector
and sorts it with std::sort
:
Check it out here:
https://github.com/mikael-s-persson/templight-tools
Finally, there is also another related project that creates an interactive shell and debugger to be able to interactively walk up and down the template instantiation graph:
https://github.com/sabel83/metashell
回答3:
The classic book C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond comes with a 20 page appendix on profiling compile-time costs. It has a companion CD with the code that generates the graphs in that appendix.
Another paper is http://aszt.inf.elte.hu/~gsd/s/cikkek/profiling/profile.pdf, perhaps that is of use to you.
Yet another, but more labor-intensive, way is to count the number of instantiations of each class from your compiler output.
回答4:
http://www.cs.uoregon.edu/research/tau/about.php might be something which can be of your interest as for templated entities, it shows the breakup of time spent for each instantiation. Other data includes how many times each function was called, how many profiled functions did each function invoke, and what the mean inclusive time per call was
来源:https://stackoverflow.com/questions/15818281/profiling-template-metaprogram-compilation-time