G++ optimization beyond -O3/-Ofast

前端 未结 8 2031
攒了一身酷
攒了一身酷 2021-01-29 17:15

The Problem

We have a mid-sized program for a simulation task, that we need to optimize. We have already done our best optimizing the source to the limi

8条回答
  •  遥遥无期
    2021-01-29 17:58

    relatively new hardware (Intel i5 or i7)

    Why not invest in a copy of the Intel compiler and high performance libraries? It can outperform GCC on optimizations by a significant margin, typically from 10% to 30% or even more, and even more so for heavy number-crunching programs. And Intel also provide a number of extensions and libraries for high-performance number-crunching (parallel) applications, if that's something you can afford to integrate into your code. It might payoff big if it ends up saving you months of running time.

    We have already done our best optimizing the source to the limit of our programming skills

    In my experience, the kind of micro- and nano- optimizations that you typically do with the help of a profiler tend to have a poor return on time-investments compared to macro-optimizations (streamlining the structure of the code) and, most importantly and often overlooked, memory access optimizations (e.g., locality of reference, in-order traversal, minimizing indirection, wielding out cache-misses, etc.). The latter usually involves designing the memory structures to better reflect the way the memory is used (traversed). Sometimes it can be as simple as switching a container type and getting a huge performance boost from that. Often, with profilers, you get lost in the details of the instruction-by-instruction optimizations, and memory layout issues don't show up and are usually missed when forgetting to look at the bigger picture. It's a much better way to invest your time, and the payoffs can be huge (e.g., many O(logN) algorithms end up performing almost as slow as O(N) just because of poor memory layouts (e.g., using a linked-list or linked-tree is a typical culprit of huge performance problems compared to a contiguous storage strategy)).

提交回复
热议问题