Multiplying 32 bit two numbers on 8086 microprocessor

前端未结

关注

 3  1945

I have code example for multiplying two 16 bit numbers on 8086 and trying to update it for two 32 bit numbers multiplying.

start:
 MOV AX,0002h ; 16 bit multipli


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2021-01-23 07:23
              
            
            
                                                                       
Solution n. 2 seems that not work if the product is large more then 32 Bit.
Furthermore the shift instructions are wrong.
This solution work correctly:

Procedure _PosLongIMul2; Assembler;

{INPUT:

 DX:AX-> First factor (destroyed).
 BX:CX-> Second factor (destroyed).

 OUTPUT:

 BX:CX:DX:AX-> Multiplication result.

 TEMP:

 BP, Di, Si}

Asm

     Jmp   @Go

 @VR:DD    0      {COPY of RESULT     (LOW)}
     DD    0      {COPY of RESULT    (HIGH)}

 @Go:Push  BP

     Mov   BP,20H {32 Bit Op.}

     XOr   DI,DI  {COPY of first op.  (LOW)}
     XOr   SI,SI  {COPY of first op. (HIGH)}

     Mov   [CS:OffSet @VR  ],Word(0)
     Mov   [CS:OffSet @VR+2],Word(0)
     Mov   [CS:OffSet @VR+4],Word(0)
     Mov   [CS:OffSet @VR+6],Word(0)

 @01:ShR   BX,1
     RCR   CX,1

     JAE   @00

     Add   [CS:OffSet @VR  ],AX
     AdC   [CS:OffSet @VR+2],DX
     AdC   [CS:OffSet @VR+4],DI
     AdC   [CS:OffSet @VR+6],SI

 @00:ShL   AX,1
     RCL   DX,1
     RCL   DI,1
     RCL   SI,1

     Dec   BP
     JNE   @01

     Mov   AX,[CS:OffSet @VR]
     Mov   DX,[CS:OffSet @VR+2]
     Mov   CX,[CS:OffSet @VR+4]
     Mov   BX,[CS:OffSet @VR+6]

     Pop   BP

End;


This works between two unsigned integer.

If you want to multiply a 32 Bit unsigned integer for a 16 Bit unsigned integer, you can use the Mul instruction as follow:

Function Mul32Bit(M1:LongInt;M2:Word):LongInt; Assembler;

Asm
 LEA   SI,M1
 Mov   AX,[SS:SI]
 Mov   CX,[SS:SI+2]
{CX:AX contains number to multiply by}
 Mov   BX,M2
{BX contains number that multiply}
 Mul   BX
 XChG  AX,CX
 Mov   SI,DX
 Mul   BX
 Add   AX,SI
 AdC   DX,0
{DX:AX:CX contains the result of multiplication}
 Mov   DX,AX
 Mov   AX,CX
{DX:AX contains the partial result of m. and is the function's result}
End;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2021-01-23 07:26
              
            
            
                                                                       
Give a man a fish and blah-blah-blah…

It’s good, that you have a code example. But do you understand the algorithm?

Okay, let’s go through it step by step on a simplified example: multiplying two 8-bit registers in AL and AH, and storing the result in DX.

BTW, you can use any registers you like unless this or that instruction requires any particular register. Like, for example, SHL reg, CL.

But before we actually start, there’re a couple of optimizations for the algorithm you provided. Assembly is all about optimization, you know. Either for speed or for size. Otherwize you do bloatware in C# or smth. else.

MOV DI,AX
AND DI,01h
XOR DI,01h
JZ ADD


What this part does is simply checks if the first bit (bit #0) in AX is set or not.
You could simply do

TEST AX, 1
JNZ ADD


But you only need to test one bit, thus TEST AL, 1 instead of TEST AX, 1 saves you one byte.

Next,

RCR DX,1


There’s no need in rotation, so it could simply be SHR DX, 1. But both instructions take the same time to execute and both two bytes long, thus doesn’t matter in this example.

Next,

DEC SI
CMP SI,0
JNZ LOOP


Never ever compare with zero after DEC. It’s moveton! Simply do

DEC SI
JNZ LOOP


Next,
Unnecessary loop split

JZ ADD
CONT:
. . .
JMP END
ADD:
ADD DX, BX
JMP CONT
END:
. . .


Should be

JNZ CONT
ADD DX, BX
CONT:
. . .
END:
. . .


Here we go with a bit optimized routine you have:

LOOP:
 TEST AL, 1
 JZ SHORT CONT
 ADD DX, BX
CONT:
 RCR DX, 1
 RCR CX, 1
 SHR AX, 1
 DEC SI
 JNZ LOOP
END:


That’s it. Now back (or forward?) to what this little piece of code actually does. The following code sample fully mimics your example, but for 8-bit registers.

 MOV AL,12h   ; 8 bit multiplicand
 MOV AH,34h   ; 8 bit multiplier
 XOR DX, DX   ; result
 MOV CX, 8    ; loop for 8 times

LOOP:
 TEST AL, 1
 JZ SHORT CONT
 ADD DH, AH
CONT:
 SHR DX, 1
 SHR AL, 1
 DEC CX
 JNZ LOOP
END:


This is a Long Multiplication algorithm

 12h = 00010010
               x
 34h = 01110100
       --------
       00000000
      01110100
     00000000
    00000000
   01110100
  00000000
 00000000
00000000


Add shifted 34h two times:

0000000011101000
+
0000011101000000
----------------
0000011110101000 = 03A8


That’s it!
Now to use more digits you use the same approach. Below is the implementation in fasm syntax. Result is stored in DX:CX:BX:AX

Num1    dd 0x12345678
Num2    dd 0x9abcdef0

 mov si, word [Num1]
 mov di, word [Num1 + 2]
 xor ax, ax
 xor bx, bx
 xor cx, cx
 xor dx, dx
 mov bp, 32

_loop:
 test si, 1
 jz short _cont
 add cx, word [Num2]
 adc dx, word [Num2 + 2]
_cont:
 rcr dx, 1
 rcr cx, 1
 rcr bx, 1
 rcr ax, 1
 rcr di, 1
 rcr si, 1
 dec bp
 jnz short _loop


Cheers ;)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2021-01-23 07:28
              
            
            
                                                                       
For the record, 8086 has a mul instruction that makes this much easier (and more efficient on later CPUs with fast mul).  On original 8086 it was really slow, but running an RCL multi-precision shift loop 32 times sucks a lot on all CPUs!  This version has less static code size, which is nice.

You only need three mul instructions to get the low * low, low * high, and high * low products.  (And if you wanted the full 64-bit result, another one for the high * high product).

8086 is missing the efficient imul reg, reg form that doesn't need DX:AX as an implicit output, and that doesn't put the high half anywhere.  So unfortunately we need more register shuffling than a compiler would for a 64x64 => 64 multiply in 32-bit mode, but otherwise this is exactly the same problem.  (See https://godbolt.org/z/ozSkt_)

x_lo, x_hi, y_lo, and y_hi can be memory relative to bp as locals or function args, or labels.  Or some of those could be in registers that this function doesn't use, if you change the syntax so they're not addressing modes.

;; untested
;; inputs: uint32_t x, y in memory
;; clobbers: CX, SI, DI

    mov     ax, [y_lo]
    mov     cx, ax
    mul     word ptr [x_hi]
    mov     si, ax            ; save  y_lo * x_hi

    mov     ax, [x_lo]
    mov     di, ax
    mul     word ptr [y_hi]
    add     si, ax            ; sum of the cross products

    mov     ax, di
    mul     cx                ; DX:AX = y_lo * x_lo
    add     dx, si            ; add the cross products into the high half
;; Result: uint32_t DX:AX = X * Y


To use fewer tmp registers, you could just reload x_lo and y_lo from memory twice each instead of saving them in DI and CX.

Note that we don't save the high-half DX results of either lo * hi product because we only want a 32-bit result, not a full 32x32 => 64-bit result.  The low 16 bits of those products add into the top half our our final 32-bit product.  (And we don't need carry-out from them into the top-most 16-bit word of a 64-bit result, so we can add them before the last mul.)

A 16 * 32 => 32-bit multiply would be even easier, just two mul and one add (plus a bunch of mov to get data into the right places).  See for example a factorial loop that does this: multiply two consecutive times in assembly language program  (that answer also shows how extended-precision multiply math works, the same way you add terms for the paper & pencil algorithm for doing multiplication on numbers of multiple decimal digits.)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复