问题
Assuming that in some C or C++ code I have a function named T fma( T a, T b, T c )
that performs 1 multiplication and 1 addition like so ( a * b ) + c
; how I'm supposed to optimize multiple mul & add steps ?
For example my algorithm needs to be implemented with 3 or 4 fma operations chained and summed together, How I can write this is an efficient way and at what part of the syntax or semantics I should dedicate particular attention ?
I also would like some hints on the critical part: avoid changing the rounding mode for the CPU to avoid flushing the cpu pipeline. But I'm quite sure that just using the +
operation between multiple calls to fma
shouldn't change that, I'm saying "quite sure" because I don't have too many CPUs to test this, I'm just following some logical steps.
My algorithm is something like the total of multiple fma calls
fma ( triplet 1 ) + fma ( triplet 2 ) + fma ( triplet 3 )
回答1:
Recently, in Build 2014 Eric Brumer gave a very nice talk on the topic (see here). The bottom line of talk was that
Using Fused Multiply Accumulate (aka FMA) everywhere hurts performance.
In Intel CPUs a FMA instruction costs 5 cycles. Instead doing a multiplication (5 cycles) and an addition (3 cycles) costs 8 cycles. Using FMA your are getting two operations in the prize of one (see picture below).
However, FMA seems not to be the holly grail of instructions. As you can see in the picture below FMA can in certain citations hurt the performance.
In the same fashion, your case fma(triplet1) + fma(triplet2) + fma(triplet 3)
costs 21 cycles whereas if you were to do the same operations with out FMA would cost 30 cycles. That's a 30% gain in performance.
Using FMA in your code would demand using compiler intrinsics. In my humble opinion though, FMA etc. is not something you should be worried about, unless you are a C++ compiler programmer. If your are not, let the compiler optimization take care of these technicalities. Generally, under such kind of concerns lies the root of all evil (i.e., premature optimization), to paraphrase one of the great ones (i.e., Donald Knuth).
来源:https://stackoverflow.com/questions/23710356/how-to-chain-multiple-fma-operations-together-for-performance