Relative performance of x86 inc vs. add instruction

前端 未结 4 1620
无人共我
无人共我 2020-12-01 17:05

Quick question, assuming beforehand

mov eax, 0

which is more efficient?

inc eax
inc eax

or

add         


        
相关标签:
4条回答
  • 2020-12-01 17:38

    From the Intel manual that you can find here it looks like the ADD/SUB instructions are half a cycle cheaper on one particular architecture. But remember that Intel uses an out-of-order execution model for it's (recent) processors. This primarily means, performance bottlenecks show up wherever the processor has to wait for data to come in (eg. it ran out of things to do during the L1/L2/L3/RAM data-fetch). So if you're profiler tells you INC might be the problem; look at it form a data-throughput point of view instead of looking at raw cycle-counts.

    Instruction              Latency1           Throughput         Execution Unit 
                                                                2 
    CPUID                    0F_3H    0F_2H      0F_3H    0F_2H    0F_2H 
    
    ADD/SUB                  1        0.5        0.5      0.5      ALU 
    [...]
    DEC/INC                  1        1          0.5      0.5      ALU 
    
    0 讨论(0)
  • 2020-12-01 17:39

    If you ever wanna know raw performance stats of x86 instructions, see Dr Agner Fogs listings (volume 4 to be exact). As for the part about compilers, thats dependent on the compiler's code generator, and not something you should rely on too much.

    on a side note: I find it funny/ironic that in a question about performance, you used MOV EAX,0 to zero a register instead of XOR EAX,EAX :P (and if MOV EAX,0 was done beforehand, the fastest variant would be to remove the inc's and add's and just MOV EAX,2).

    0 讨论(0)
  • 2020-12-01 17:46

    For all purposes, it probably doesn't matter. But take into account that inc uses less bytes.

    Consider the following code:

    int x = 0;
    x += 2;
    

    Without using any optimization flags, GCC compiles this code into:

    80483ed:       c7 44 24 1c 00 00 00    movl   $0x0,0x1c(%esp)
    80483f4:       00 
    80483f5:       83 44 24 1c 02          addl   $0x2,0x1c(%esp)
    

    Using -O1 and -O2, it becomes:

    c7 44 24 08 02 00 00    movl   $0x2,0x8(%esp)
    

    Funny, isn't it?

    0 讨论(0)
  • 2020-12-01 17:47

    Two inc instructions on the same register (or more generally speaking two read-modify-write instructions) do always have a dependency chain of at least two cycles. This is assuming a one clock latency for a inc, which is the case since the 486. That means if the surrounding instructions can't be interleaved with the two inc instructions to hide those latencies, the code will execute slower.

    But no compiler will emit the instruction sequence you propose anyway (mov eax,0 will be replaced by xor eax,eax, see What is the purpose of XORing a register with itself?)

    mov eax,0
    inc eax
    inc eax
    

    it will be optimizied to

    mov eax,2
    
    0 讨论(0)
提交回复
热议问题