Conflicting signs in x86 assembly: movsx then unsigned compare/branch?

前端 未结 3 967
情歌与酒
情歌与酒 2021-01-19 13:53

I am confused in the following snippet:

movsx   ecx, [ebp+var_8] ; signed move
cmp     ecx, [ebp+arg_0]
jnb     short loc_401027 ; unsigned jump
相关标签:
3条回答
  • 2021-01-19 14:14

    As noted by Jester, unsigned comparison can be used to do range checks for signed numbers. For example, a common C expression that checks whether an index is between 0 and some limit:

    short idx = ...;
    int limit = ...; // actually, it's called "arg_0" - is it a function's argument?
    if (idx >= 0 && idx < limit)
    {
        // do stuff
    }
    

    Here idx, after sign-extension, is a signed 32-bit number (int). The idea is, when comparing it with limit as if it were unsigned, it does both comparisons at once.

    1. If idx is positive, then "signed" or "unsigned" doesn't matter, so unsigned comparison gives the correct answer.
    2. If idx is negative, then interpreting it as an unsigned number will yield a very big number (greater than 231-1), so in this case, unsigned comparison also gives the correct answer.

    So one unsigned comparison does the work of two signed comparisons. This only works when limit is signed and non-negative. If the compiler can prove it's non-negative, it will generate such optimized code.


    Another possibility is if the initial C code is buggy and it compares signed with unsigned. A somewhat surprising feature of C is that when a signed variable is compared with unsigned, the effect is unsigned comparison.

    short x = ...;
    unsigned y = ...;
    
    // Buggy code!
    if (x < y) // has surprising behavior for e.g. x = -1
    {
        // do stuff
    }
    
    if (x < (int)y) // better; still buggy if the casting could overflow
    {
        // do stuff
    }
    
    0 讨论(0)
  • 2021-01-19 14:14

    Addendum to anatolyg answer:

    In the principle, there's no clash on the assembly level.

    The information in computer is encoded in bits (one bit = zero or one), and the ecx is 32 bits of information, nothing else.

    Whether you interpret the top bit as sign or not, that's up to the following code, i.e. on assembly level it's perfectly legal to use movsx to extend the value (in signed-like way), even if you interpret it later as bit mask or unsigned int.

    Whether there's clash on logical level depends on the planned functionality by author. If the author did want that test against arg_0to not branch if var_8 is "negative" value and arg_0 < 231, then the code is correct.

    BTW the disassembly is missing information about the size of argument in the first movsx, so the disassembly tool producing this is confusing (is it otherwise good? Be cautious).

    So, is var_8 signed or unsigned? And what about arg_0?

    var_8 is first and foremost memory address, and from there either 8 or 16 bits of information is used (not clear from your disassembly, which one) - in "signed" way. But it's difficult to tell more about var_8 without exploring full code, it may even be the var_8 is 32 bit unsigned int "variable", but for some reason the author decides to use only sing-extended low 16 bits of its content in that first movsx. arg_0 is then used as unsigned 32 bit integer for the cmp instruction.

    In assembly the question is not as much whether var_8 is signed or unsigned, the question in assembly is how many bits of information you have and where, and what's the interpretation of those bits by the following code.

    There's lot more freedom in this than in C or other high level programming languages, for example if you have four byte counter in memory, which you know each of them is less than 200, and you want to increment first and last of them, you can do this:

    .data
    counter1: db 13
    counter2: db 6
    counter3: db 34
    counter4: db 17
    
    .text
        ...
        ; increment first and last counter in one instruction
        ; overflow not-expected/handled, counters should to be < 200
        add  dword [counter1],0x01000001
    

    Now (imagine) how will you interpret this when disassembling such code, not having the original comments from the source above? Will get tricky, if you don't understand from the other code the counter1-4 are used as separate byte counters, and this is speed optimization to increment two of them in single instruction.

    0 讨论(0)
  • 2021-01-19 14:32

    This can be the result of a range check like this, with the lower bound not only limited to 0 but any integer values

    int8_t var_8 = ...;
    if (LOWER_BOUND <= var_8 && var_8 <= UPPER_BOUND)
    

    The above expression can be optimized into

    unsigned arg_0 = UPPER_BOUND - LOWER_BOUND;
    if ((unsigned)(var_8 - LOWER_BOUND) <= arg_0)
    

    with uint32_t arg_0 = UPPER_BOUND - LOWER_BOUND

    This is a trick to determine if an integer is between two integers (inclusive) with known sets of values.

    Most modern compilers already know how to do this optimization when the bounds are constants like this. For example gcc will emit the below instructions for the first snippet above

        add     edi, -LOWER_BOUND 
        cmp     dil, UPPER_BOUND - LOWER_BOUND
        jbe     .L5
    
    0 讨论(0)
提交回复
热议问题