Is < faster than <=?

后端 未结 14 817
孤城傲影
孤城傲影 2020-11-22 13:43

Is if( a < 901 ) faster than if( a <= 900 ).

Not exactly as in this simple example, but there are slight performance changes on loop

相关标签:
14条回答
  • 2020-11-22 13:55

    You should not be able to notice the difference even if there is any. Besides, in practice, you'll have to do an additional a + 1 or a - 1 to make the condition stand unless you're going to use some magic constants, which is a very bad practice by all means.

    0 讨论(0)
  • 2020-11-22 13:56

    They have the same speed. Maybe in some special architecture what he/she said is right, but in the x86 family at least I know they are the same. Because for doing this the CPU will do a substraction (a - b) and then check the flags of the flag register. Two bits of that register are called ZF (zero Flag) and SF (sign flag), and it is done in one cycle, because it will do it with one mask operation.

    0 讨论(0)
  • 2020-11-22 13:57

    For floating point code, the <= comparison may indeed be slower (by one instruction) even on modern architectures. Here's the first function:

    int compare_strict(double a, double b) { return a < b; }
    

    On PowerPC, first this performs a floating point comparison (which updates cr, the condition register), then moves the condition register to a GPR, shifts the "compared less than" bit into place, and then returns. It takes four instructions.

    Now consider this function instead:

    int compare_loose(double a, double b) { return a <= b; }
    

    This requires the same work as compare_strict above, but now there's two bits of interest: "was less than" and "was equal to." This requires an extra instruction (cror - condition register bitwise OR) to combine these two bits into one. So compare_loose requires five instructions, while compare_strict requires four.

    You might think that the compiler could optimize the second function like so:

    int compare_loose(double a, double b) { return ! (a > b); }
    

    However this will incorrectly handle NaNs. NaN1 <= NaN2 and NaN1 > NaN2 need to both evaluate to false.

    0 讨论(0)
  • 2020-11-22 13:57

    You could say that line is correct in most scripting languages, since the extra character results in slightly slower code processing. However, as the top answer pointed out, it should have no effect in C++, and anything being done with a scripting language probably isn't that concerned about optimization.

    0 讨论(0)
  • 2020-11-22 14:00

    No, it will not be faster on most architectures. You didn't specify, but on x86, all of the integral comparisons will be typically implemented in two machine instructions:

    • A test or cmp instruction, which sets EFLAGS
    • And a Jcc (jump) instruction, depending on the comparison type (and code layout):
      • jne - Jump if not equal --> ZF = 0
      • jz - Jump if zero (equal) --> ZF = 1
      • jg - Jump if greater --> ZF = 0 and SF = OF
      • (etc...)

    Example (Edited for brevity) Compiled with $ gcc -m32 -S -masm=intel test.c

        if (a < b) {
            // Do something 1
        }
    

    Compiles to:

        mov     eax, DWORD PTR [esp+24]      ; a
        cmp     eax, DWORD PTR [esp+28]      ; b
        jge     .L2                          ; jump if a is >= b
        ; Do something 1
    .L2:
    

    And

        if (a <= b) {
            // Do something 2
        }
    

    Compiles to:

        mov     eax, DWORD PTR [esp+24]      ; a
        cmp     eax, DWORD PTR [esp+28]      ; b
        jg      .L5                          ; jump if a is > b
        ; Do something 2
    .L5:
    

    So the only difference between the two is a jg versus a jge instruction. The two will take the same amount of time.


    I'd like to address the comment that nothing indicates that the different jump instructions take the same amount of time. This one is a little tricky to answer, but here's what I can give: In the Intel Instruction Set Reference, they are all grouped together under one common instruction, Jcc (Jump if condition is met). The same grouping is made together under the Optimization Reference Manual, in Appendix C. Latency and Throughput.

    Latency — The number of clock cycles that are required for the execution core to complete the execution of all of the μops that form an instruction.

    Throughput — The number of clock cycles required to wait before the issue ports are free to accept the same instruction again. For many instructions, the throughput of an instruction can be significantly less than its latency

    The values for Jcc are:

          Latency   Throughput
    Jcc     N/A        0.5
    

    with the following footnote on Jcc:

    7) Selection of conditional jump instructions should be based on the recommendation of section Section 3.4.1, “Branch Prediction Optimization,” to improve the predictability of branches. When branches are predicted successfully, the latency of jcc is effectively zero.

    So, nothing in the Intel docs ever treats one Jcc instruction any differently from the others.

    If one thinks about the actual circuitry used to implement the instructions, one can assume that there would be simple AND/OR gates on the different bits in EFLAGS, to determine whether the conditions are met. There is then, no reason that an instruction testing two bits should take any more or less time than one testing only one (Ignoring gate propagation delay, which is much less than the clock period.)


    Edit: Floating Point

    This holds true for x87 floating point as well: (Pretty much same code as above, but with double instead of int.)

            fld     QWORD PTR [esp+32]
            fld     QWORD PTR [esp+40]
            fucomip st, st(1)              ; Compare ST(0) and ST(1), and set CF, PF, ZF in EFLAGS
            fstp    st(0)
            seta    al                     ; Set al if above (CF=0 and ZF=0).
            test    al, al
            je      .L2
            ; Do something 1
    .L2:
    
            fld     QWORD PTR [esp+32]
            fld     QWORD PTR [esp+40]
            fucomip st, st(1)              ; (same thing as above)
            fstp    st(0)
            setae   al                     ; Set al if above or equal (CF=0).
            test    al, al
            je      .L5
            ; Do something 2
    .L5:
            leave
            ret
    
    0 讨论(0)
  • 2020-11-22 14:02

    This would be highly dependent on the underlying architecture that the C is compiled to. Some processors and architectures might have explicit instructions for equal to, or less than and equal to, which execute in different numbers of cycles.

    That would be pretty unusual though, as the compiler could work around it, making it irrelevant.

    0 讨论(0)
提交回复
热议问题