I\'m writing a C subroutine in assembler that needs to find the 2 largest values out of 4 values passed in and multiplies them together. I\'m working on finding the largest val
A naive beginners way to find two max numbers (I hope this will get you unstuck on the reasoning, how to get second highest ... you simply search also for second highest, while searching for the highest):
push bp
mov bp,sp
mov ax,[bp+4] ; temporary max1 = first argument
mov bx,8000h ; temporary max2 = INT16_MIN
; max2 <= max1
mov dx,[bp+6]
call updateMax1Max2
mov dx,[bp+8]
call updateMax1Max2
mov dx,[bp+10]
call updateMax1Max2
; ax and bx contains here max1 and max2
imul bx ; signed multiplication, all arguments are signed
; dx:ax = max1 * max2
; "mul" would produce wrong result for input data like -1, -2, -3, -4
pop bp
ret
updateMax1Max2:
; dx is new number, [ax, bx] are current [max1, max2] (max2 <= max1)
cmp bx,dx ; compare new value to lesser max2
jge updateMax1Max2_end
mov bx,dx ; new max2
cmp ax,dx ; compare new value to greater max1
jge updateMax1Max2_end ; new max2 is already <= max1
xchg ax,bx ; new value promoted to new max1, old max1 is now max2
updateMax1Max2_end:
ret
It's keeping two temporary max values at the same time, for the price of a bit more complex update (testing new value not only against single max, but also against the second one).
Then it somewhat optimized by keeping the two temporaries in specified order, so when new value is lower than max2, it is discarded immediately, not testing against max1.
That complex "is the new value bigger than already kept max1/max2" code is put into separate sub-routine, so it can be reused several times.
And finally the initial state of [max1,max2] is set to [first_argument, INT16_MIN], so that sub-routine can be applied for the remaining three arguments in the simple way (getting the code complexity somewhat back by reusing the code a lot).
Peter's and Terje's suggestions provide great insight into advanced possibilities, but they also nicely demonstrate how performance asm coding can be tricky (as they both had to add errata to their original ideas).
When stuck or in doubt, try to do the most straightforward solution available (like you would solve it as human). Just try to keep number of instructions low (writing it in generic way, reusing any bigger part of code in sub-routines when possible), so it's easy to debug and comprehend.
Then feed that with several possible inputs, exercising also corner cases ([some example values], [INT16_MIN, INT16_MIN, INT16_MIN, INT16_MIN], [INT16_MAX, INT16_MAX, INT16_MAX, INT16_MAX], [-1, -2, -3, -4], [-2, -1, 0, INT16_MAX], etc...), and verify the results are correct (ideally in some code too, so you can rerun all the tests after next change to the routine).
This is the crucial step, which will save you from your original wrong assumptions, overlooking some corner case results. In ideal case don't even run your code directly, go straight into debugger and single step every of those test cases, to validate not only result, but also keep checking if the internal state during calculation is working as expected.
After that you may check for some "code golfing", how to exploit all the properties of the situation to lower the workload (simplifying the algorithm) and/or number of instructions and how to replace performance-hurting code with alternative faster approach.