I am confused in the following snippet:
movsx ecx, [ebp+var_8] ; signed move
cmp ecx, [ebp+arg_0]
jnb short loc_401027 ; unsigned jump
As noted by Jester, unsigned comparison can be used to do range checks for signed numbers. For example, a common C expression that checks whether an index is between 0 and some limit:
short idx = ...;
int limit = ...; // actually, it's called "arg_0" - is it a function's argument?
if (idx >= 0 && idx < limit)
{
// do stuff
}
Here idx
, after sign-extension, is a signed 32-bit number (int
). The idea is, when comparing it with limit
as if it were unsigned, it does both comparisons at once.
idx
is positive, then "signed" or "unsigned" doesn't matter, so unsigned
comparison gives the correct answer.idx
is negative, then interpreting it as an unsigned number will yield a very big number (greater than 231-1), so in this case, unsigned comparison also gives the correct answer.So one unsigned comparison does the work of two signed comparisons. This only works when limit
is signed and non-negative. If the compiler can prove it's non-negative, it will generate such optimized code.
Another possibility is if the initial C code is buggy and it compares signed with unsigned. A somewhat surprising feature of C is that when a signed variable is compared with unsigned, the effect is unsigned comparison.
short x = ...;
unsigned y = ...;
// Buggy code!
if (x < y) // has surprising behavior for e.g. x = -1
{
// do stuff
}
if (x < (int)y) // better; still buggy if the casting could overflow
{
// do stuff
}
Addendum to anatolyg answer:
In the principle, there's no clash on the assembly level.
The information in computer is encoded in bits (one bit = zero or one), and the ecx
is 32 bits of information, nothing else.
Whether you interpret the top bit as sign or not, that's up to the following code, i.e. on assembly level it's perfectly legal to use movsx
to extend the value (in signed-like way), even if you interpret it later as bit mask or unsigned int.
Whether there's clash on logical level depends on the planned functionality by author. If the author did want that test against arg_0
to not branch if var_8
is "negative" value and arg_0
< 231, then the code is correct.
BTW the disassembly is missing information about the size of argument in the first movsx
, so the disassembly tool producing this is confusing (is it otherwise good? Be cautious).
So, is var_8 signed or unsigned? And what about arg_0?
var_8
is first and foremost memory address, and from there either 8 or 16 bits of information is used (not clear from your disassembly, which one) - in "signed" way. But it's difficult to tell more about var_8
without exploring full code, it may even be the var_8
is 32 bit unsigned int "variable", but for some reason the author decides to use only sing-extended low 16 bits of its content in that first movsx
. arg_0
is then used as unsigned 32 bit integer for the cmp
instruction.
In assembly the question is not as much whether var_8
is signed or unsigned, the question in assembly is how many bits of information you have and where, and what's the interpretation of those bits by the following code.
There's lot more freedom in this than in C or other high level programming languages, for example if you have four byte counter in memory, which you know each of them is less than 200, and you want to increment first and last of them, you can do this:
.data
counter1: db 13
counter2: db 6
counter3: db 34
counter4: db 17
.text
...
; increment first and last counter in one instruction
; overflow not-expected/handled, counters should to be < 200
add dword [counter1],0x01000001
Now (imagine) how will you interpret this when disassembling such code, not having the original comments from the source above? Will get tricky, if you don't understand from the other code the counter1-4
are used as separate byte counters, and this is speed optimization to increment two of them in single instruction.
This can be the result of a range check like this, with the lower bound not only limited to 0 but any integer values
int8_t var_8 = ...;
if (LOWER_BOUND <= var_8 && var_8 <= UPPER_BOUND)
The above expression can be optimized into
unsigned arg_0 = UPPER_BOUND - LOWER_BOUND;
if ((unsigned)(var_8 - LOWER_BOUND) <= arg_0)
with uint32_t arg_0 = UPPER_BOUND - LOWER_BOUND
This is a trick to determine if an integer is between two integers (inclusive) with known sets of values.
Most modern compilers already know how to do this optimization when the bounds are constants like this. For example gcc will emit the below instructions for the first snippet above
add edi, -LOWER_BOUND
cmp dil, UPPER_BOUND - LOWER_BOUND
jbe .L5