I want to add two numbers that have 12 bytes and to store the result in a 16 bytes var. How can i do this?
section .data
big_num1 dd 0x11111111, 0x22222222,
Elementary school:
1234
+ 5678
========
start filling it in
1
1234
+ 5678
========
2
4+8 = 12, so 2 carry the one.
in a computer you would add a = 4 + 8 adc b = 3 + 7 adc c = 2 + 6 adc d = 1 + 5
then dcba contain your result, it scales as wide as you want. d,c,b,a can be 8 bit, 16 bit, 32 bit or 64 bit depending on the instruction set. most have add and adc if they have flags, the ones that dont have flags then you can synthesize them in various ways that are not difficult at all... (break your operands into 16 bit quantities using 32 bit registers/memory do the 32 bit add now bit 16 is your carry out, add that into the next 16 bit chunk, takes some shifting and masking but it all works the same, since you probably have adc then you dont need to do any of that just do the trivial add, adc,adc, adc... until done.
If you clear the flag before you start you can use adc in a loop.
Now if your variables do not line up with the adder in the processor then you do have to synthesize it in some way.
Grade school math for the same problem, now you have to do the columns separately.
4
+ 8
====
12
and you have to manually mask and shift the result (12>>1) % 9 = 1 in base 10.
1
3
+ 7
====
11
then
1
2
+ 6
====
9
this one carries the zero
0
1
+ 5
====
6
big_num1 dd 0x11111111, 0x22222222, 0x33333333 big_num2 dd 0xffffffff, 0x22222222, 0x33333333
Because x86 is a little-endian architecture, the lowest part of a number is stored in memory at the lowest addresses. For big_num1 the first defined dword (value is 0x11111111) is at the lowest address and thus is the lowest part of the number. In the normal number representation this is what goes at the right-handside.
big_num1 == 0x333333332222222211111111
big_num2 == 0x3333333322222222FFFFFFFF
You add corresponding digits going from right to left, just like everybody has learned at school.
In the hexadecimal representation of these numbers there are 24 digits to consider. However since the architecture is 32-bit, we can nicely make 3 groups of 8 digits.
For the 1st group we simply use ADD
:
mov eax, [big_num1] ; 0x11111111
add eax, [big_num2] ; + 0xFFFFFFFF <-- This produces a carry
mov [result_4dword], eax ; 0x00000000
For the 2nd group we use ADC
to pick up a possible carry from the previous addition:
mov eax, [big_num1 + 4] ; 0x22222222
adc eax, [big_num2 + 4] ; + 0x22222222 + CF=1 <-- No new carry
mov [result_4dword + 4], eax ; 0x44444445
For the 3rd group we use ADC
to pick up a possible carry from the previous addition:
mov eax, [big_num1 + 8] ; 0x33333333
adc eax, [big_num2 + 8] ; + 0x33333333 + CF=0 <-- No new carry
mov [result_4dword + 8], eax ; 0x66666666
Key here is that we can also use ADC
for the 1st group if we expressly clear the carry flag beforehand:
clc
mov eax, [big_num1] ; 0x11111111
adc eax, [big_num2] ; + 0xFFFFFFFF + CF=0 <-- This produces a carry
mov [result_4dword], eax ; 0x00000000
Now we can write a loop with 3 iterations but we have to be careful about not changing the carry flag inadvertently. That's why I use LEA
instead of ADD
in order to advance the offset. DEC
is also an instruction that does not destroy the carry flag. I've preferred the combo DEC ECX
JNZ ...
because it's better than LOOP ...
:
mov ecx, 3
xor ebx, ebx ; This additionally clears the carry flag
Again:
mov eax, [big_num1 + ebx]
adc eax, [big_num2 + ebx] ; Can produce a new carry flag
mov [result_4dword + ebx], eax
lea ebx, [ebx + 4] ; This does not clobber the carry flag
dec ecx ; This does not clobber the carry flag
jnz Again
If after these 3 additions there's still a set carry, you'll have to write a 1 in the 4th dword of result_4dword, else you'll have to write a 0 here. Because result_4dword is in the .bss section, you should not count on any preset value like zero!
setc cl
mov [result_4dword + ebx], ecx ; ECX=[0,1]
Please note that I've changed result_4word into result_4dword. Makes more sense...