I am learning for my exam and I am so confused by this assembly code. It is a program in which first user enters a string, than that string gets coded and printed, than decoded
You need to understand the structure of memory, how the string is stored.
The teacher's code is missing any comments, so it was either your task to figure it out (and you failed), or I will not comment any further about your teacher due to diplomacy reasons.
The structure of string buffer is the one used by MS-DOS for function 0Ah
of int 21h
(description):
MyString:
db string_maximum_size ; maximum characters to store into buffer
db character_actually_read ; characters read by INT 21h: 0Ah function
db string_maximum_size DUP (0) ; the string characters
So by entering string "hello" the memory at address MyString
will be set to:
33, 5, 104 ('h'), 101 ('e'), 108 ('l'), 108 ('l'), 111 ('o')
followed by 26 zeroes (result of DUP (0)
).
Actually I think your code has bugs, setting up maximum size as total buffer size BUFF_LENGTH EQU STR_LENGTH + 3
, while from the interrupt description I would expect the first byte to contain only STR_LENGTH
. You may verify this by trying to input 33 characters long string, and check in debugger if the memory is overwritten after the MyString
buffer. Also the +3
doesn't make much sense, as only +2
bytes are used for max size, and actual size.
Now in code happens this:
LEA bx,[MyString] ; bx = address of first byte of buffer (contains maximum size)
INC bx ; bx now points to actual size
; instead LEA bx,[MyString+1] could have been used, skipping one INC bx
MOV cl,[bx] ; cl = actual string size
XOR ch,ch ; ch = 0 (extending 8 bit value in cl to unsigned 16 bit in cx)
; other option on 386+ CPU is MOVZX cx,BYTE PTR [bx]
; or XOR cx,cx MOV cl,[bx]
INC bx ; bx now points to the first character
It keeps then doing with [bx]
content whatever it wish, incrementing bx
again during loop to access next character, till the cx
counter does reach 0
.
You should definitely start up the debugger, step trough that code instruction by instruction, and point memory window to MyString
and watch how bx
is used to access particular bytes there, and how those INC bx
fits that.
This will explain it even better than anything else.
edit:
One more thing. I actually kept one secret to myself, which is integral part of your question.
So "How did I know?": you should always recall, that computers are computational machines. You put some program in (list of instructions), you put some numbers in, let it execute the instructions, and get the resulting numbers out.
I had the code (instructions). Next thing I was looking for in your code was "how do you define the string". I found it's entered by user, read by int 21h
function. So I googled the function, how it works, what data it returns. snap: suddenly all made sense (except max size bug, which I decided is simply a bug from your lector, it's easy to do some bug in ASM even for seasoned programmers).
So always make sure you understand all instructions, and you understand well what are the input data (their structure and values). Then you can run everything in your head, just like on the CPU, to find out how those input data turns into output data. It's a purely deterministic computational process, you do not need to guess anything, it's exactly defined what happens next in every stage of the computation.
If you know exactly what are those definitions, it's actually straightforwardly easy, easier than any high level abstraction stuff, just lot more tedious.
When you are new to ASM, it's much easier to watch this happening in debugger (and it will also help you to understand ASM much faster), than doing it in your head.
Explained:
INC bx ; increment bx, skip this byte (why ?)
MOV cl, [bx] ; get number of characters of the string
XOR ch, ch ; quick way to set ch to zero, so cx == cl for the loop
coding:
INC bx ; next address
MOV dl, [bx] ; get character value
XOR dl, ah ; decode it with XOR key in ah
MOV [bx], dl ; store in the same memory value
LOOP coding ; decrement cl and goto coding if cx > 0
Format of the string seems "custom", certainly not NULL terminated but rather containing size first (is it Pascal? Ada uses this kind of system)
Note that in that case encoding and decoding are the same since XOR masking is used.