How are Mathematical Equality Operators Handled at the Machine-Code Level

问题

So I wanted to ask a rather existential question today, and it's one that I feel as though most programmers skip over and just accept as something that works, without really asking the question of "how" it works. The question is rather simple: how is the >= operator compiled down to machine code, and what does that machine code look like? Down at the very bottom, it must be a greater than test, mixed with an "is equal" test. But how is this actually implemented? Thinking about it seems rather paradoxical, because at the very bottom there cannot be a > or == test. There needs to be something else. I want to know what this is.

How do computers test for equality and greater than at the fundamental level?

回答1:

Indeed there is no > or == test as such. Instead, the lowest level comparison in assembler works by binary subtraction. On x86, the opcode for integer comparisons is CMP. It is really the one instruction to rule them all. How it works is described for example in 80386 Programmer's reference manual:

CMP subtracts the second operand from the first but, unlike the SUB instruction, does not store the result; only the flags are changed.

CMP is typically used in conjunction with conditional jumps and the SETcc instruction. (Refer to Appendix D for the list of signed and unsigned flag tests provided.) If an operand greater than one byte is compared to an immediate byte, the byte value is first sign-extended.

Basically, CMP A, B (In Intel operand ordering) calculates A - B, and then discards the result. However, in an x86 ALU, arithmetic operations set condition flags inside the flag register of the CPU based on the result of the operation. The flags relevant to arithmetic operations are

Bit  Name   Function

 0   CF     Carry Flag -- Set on high-order bit carry or borrow; cleared
            otherwise.
 6   ZF     Zero Flag -- Set if result is zero; cleared otherwise.
 7   SF     Sign Flag -- Set equal to high-order bit of result (0 is
            positive, 1 if negative).
11   OF     Overflow Flag -- Set if result is too large a positive number
            or too small a negative number (excluding sign-bit) to fit in
            destination operand; cleared otherwise.

For example if the result of calculation is zero, the Zero Flag ZF is set. CMP A, B executes A - B and discards the result. The result of subtraction is 0 iff A == B. Thus the ZF will be set only when the operands are equal, cleared otherwise.

Carry flag CF would be set iff the unsigned subtraction would result in borrow, i.e. A - B would be < 0 if A and B are considered unsigned numbers and A < B.

Sign flag is set whenever the MSB bit of the result is set. This means that the result as a signed number is considered negative in 2's complement. However, if you consider the 8-bit subtraction 01111111 (127) - 10000000 (-128), the result is 11111111, which interpreted as a 8-bit signed 2's complement number is -1, even though 127 - (-128) should be 255. A signed integer overflow happened The sign flag alone doesn't alone tell which of the signed quantities was greater - theOF overflow flag tells whether a signed overflow happened in the previous arithmetic operation.

Now, depending on the place where this is used, a Byte Set on Condition SETcc or a Jump if Condition is Met Jcc instruction is used to decode the flags and act on them. If the boolean value is used to set a variable, then a clever compiler would use SETcc; Jcc would be a better match for an if...else.

Now, there are 2 choices for >=: either we want a signed comparison or an unsigned comparison.

int a, b;
bool r1, r2;
unsigned int c, d;
r1 = a >= b; // signed
r2 = c >= d; // unsigned

In Intel assembly the names of conditions for unsigned inequality use the words above and below; conditions for signed equality use the words greater and less. Thus, for r2 the compiler could decide to use Set on Above or Equal, i.e. SETAE, which sets the target byte to 1 if (CF=0). For r1 the result would be decoded by SETGE - Set Byte on Greater or Equal, which means (SF=OF) - i.e. the result of subtraction interpreted as a 2's complement is positive without overflow, or negative with overflow happening.

Finally an example:

#include <stdbool.h>
bool gte_unsigned(unsigned int a, unsigned int b) {
    return a >= b;
}

The resulting optimized code on x86-64 Linux is:

 cmp     edi, esi
 setae   al
 ret

Likewise for signed comparison

bool gte_signed(int a, int b) {
    return a >= b;
}

The resulting assembly is

 cmp     edi, esi
 setge   al
 ret

回答2:

Here's a simple C function:

bool lt_or_eq(int a, int b)
{
    return (a <= b);
}

On x86-64, GCC compiles this to:

    .file   "lt_or_eq.c"
    .text
    .globl  lt_or_eq
    .type   lt_or_eq, @function
lt_or_eq:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -4(%rbp), %eax
    cmpl    -8(%rbp), %eax
    setle   %al
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   lt_or_eq, .-lt_or_eq

The important part is the cmpl -8(%rbp), %eax; setle %al; sequence. Basically, it's using the cmp instruction to compare the two arguments numerically, and set the state of the zero flag and the carry flag based on that comparison. It then uses setle to decide whether to to set the %al register to 0 or 1, depending on the state of those flags. The caller gets the return value from the %al register.

回答3:

First the computer needs to figure out the type of the data. In a language like C, this would be at compile time, python would dispatch to different type specific tests at run time. Assuming we are coming from a compiled language, and that we know the values that we are comparing are integers, The complier would make sure that the valuses are in registers and then issue:

SUBS  r1, r2 
BGE   @target

subtracting the registers, and then checking for zero/undflow. These instructions are built in operation on the CPU. (Which I'm assuming here is ARM-like there are many variations).

来源：https://stackoverflow.com/questions/40878840/how-are-mathematical-equality-operators-handled-at-the-machine-code-level

标签

machine-code