x86 Assembly - 2 largest values out of 4 given numbers

后端 未结 3 752
隐瞒了意图╮
隐瞒了意图╮ 2021-01-29 02:28

I\'m writing a C subroutine in assembler that needs to find the 2 largest values out of 4 values passed in and multiplies them together. I\'m working on finding the largest val

3条回答
  •  温柔的废话
    2021-01-29 02:44

    A naive beginners way to find two max numbers (I hope this will get you unstuck on the reasoning, how to get second highest ... you simply search also for second highest, while searching for the highest):

        push    bp
        mov     bp,sp
        mov     ax,[bp+4]   ; temporary max1 = first argument
        mov     bx,8000h    ; temporary max2 = INT16_MIN
        ; max2 <= max1
        mov     dx,[bp+6]
        call    updateMax1Max2
        mov     dx,[bp+8]
        call    updateMax1Max2
        mov     dx,[bp+10]
        call    updateMax1Max2
    
        ; ax and bx contains here max1 and max2
        imul    bx            ; signed multiplication, all arguments are signed
        ; dx:ax = max1 * max2
    
        ; "mul" would produce wrong result for input data like -1, -2, -3, -4
    
        pop     bp
        ret
    
    updateMax1Max2:
        ; dx is new number, [ax, bx] are current [max1, max2] (max2 <= max1)
        cmp     bx,dx       ; compare new value to lesser max2
        jge     updateMax1Max2_end
        mov     bx,dx       ; new max2
        cmp     ax,dx       ; compare new value to greater max1
        jge     updateMax1Max2_end  ; new max2 is already <= max1
        xchg    ax,bx       ; new value promoted to new max1, old max1 is now max2
    updateMax1Max2_end:
        ret
    

    It's keeping two temporary max values at the same time, for the price of a bit more complex update (testing new value not only against single max, but also against the second one).

    Then it somewhat optimized by keeping the two temporaries in specified order, so when new value is lower than max2, it is discarded immediately, not testing against max1.

    That complex "is the new value bigger than already kept max1/max2" code is put into separate sub-routine, so it can be reused several times.

    And finally the initial state of [max1,max2] is set to [first_argument, INT16_MIN], so that sub-routine can be applied for the remaining three arguments in the simple way (getting the code complexity somewhat back by reusing the code a lot).


    Peter's and Terje's suggestions provide great insight into advanced possibilities, but they also nicely demonstrate how performance asm coding can be tricky (as they both had to add errata to their original ideas).

    When stuck or in doubt, try to do the most straightforward solution available (like you would solve it as human). Just try to keep number of instructions low (writing it in generic way, reusing any bigger part of code in sub-routines when possible), so it's easy to debug and comprehend.

    Then feed that with several possible inputs, exercising also corner cases ([some example values], [INT16_MIN, INT16_MIN, INT16_MIN, INT16_MIN], [INT16_MAX, INT16_MAX, INT16_MAX, INT16_MAX], [-1, -2, -3, -4], [-2, -1, 0, INT16_MAX], etc...), and verify the results are correct (ideally in some code too, so you can rerun all the tests after next change to the routine).

    This is the crucial step, which will save you from your original wrong assumptions, overlooking some corner case results. In ideal case don't even run your code directly, go straight into debugger and single step every of those test cases, to validate not only result, but also keep checking if the internal state during calculation is working as expected.

    After that you may check for some "code golfing", how to exploit all the properties of the situation to lower the workload (simplifying the algorithm) and/or number of instructions and how to replace performance-hurting code with alternative faster approach.

提交回复
热议问题