(n - Multiplication) vs (n/2 - multiplication + 2 additions) which is better?

主宰稳场 提交于 2019-12-10 16:14:42

问题


I have a C program that has n multiplications (single multiplication with n iterations) and I found another logic that has n/2 iterations of (1 multiplication + 2 additions). I know about the complexity that both are of O(n). but in terms of CPU cycles. which is faster ?


回答1:


First of all follow Dietrich Epp's first advice - measuring is (at least for complex optimization problems) the only way to be sure.

Now if you want to figure out why one is faster than the other, we can try. There are two different important performance measures: Latency and reciprocal throughput. A short summary of the two:

Latency: This is the delay that the instruction generates in a dependency chain. The numbers are minimum values. Cache misses, misalignment, and exceptions may increase the clock counts considerably. Where hyperthreading is enabled, the use of the same execution units in the other thread leads to inferior performance. Denormal numbers, NAN’s and infinity do not increase the latency. The time unit used is core clock cycles, not the reference clock cycles given by the time stamp counter.

Reciprocal throughput: The average number of core clock cycles per instruction for a series of independent instructions of the same kind in the same thread.

For Sandy bridge the rec. throughput for an add r, r/i (for further notice r=register, i=immediate, m=memory) is 0.33 while the latency is 1.

An imul r, r has a latency of 3 and a rec. throughput of 1.

So as you see it completely depends on your specific algorithm - if you can just replace one imul with two independent adds this particular part of your algorithm could get a theoretical speedup of 50% (and in the best case obviously a speedup of ~350%). But on the other hand if your adds add a problematic dependency one imul could be just as fast as one add.

Also note that we've ignored all the additional complications like memory and cache behavior (things which will generally have a much, MUCH larger influence on the execution time) or intricate stuff like µop fusion and whatnot. In general the only people that should care about this stuff are compiler writers - it's much simpler to just measure the result of their efforts ;)

Anyways if you want a good listing of this stuff see this here (the above description of latency/rec. throughput is also from that particular document).




回答2:


Test on your computer. Or, look at the specs for your processor and guess.

The old logic no longer applies: on modern processors, an integer multiplication might be very cheap, on some newish Intel processors it's 3 clock cycles. Additions are 1 cycle on these same processors. However, in a modern pipelined processor, the stalls created by data dependencies might cause additions to take longer.

My guess is that N additions + N/2 multiplications is slower than N multiplications if you are doing a fold type operation, and I would guess the reverse for a map type operation. But this is only a guess.

Test if you want the truth.

However: Most algorithms this simple are memory-bound, and both will be the same speed.



来源:https://stackoverflow.com/questions/7072097/n-multiplication-vs-n-2-multiplication-2-additions-which-is-better

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!