Assembly code for simple coding/decoding of string confusion?

前端 未结 2 1495
故里飘歌
故里飘歌 2021-01-26 11:22

I am learning for my exam and I am so confused by this assembly code. It is a program in which first user enters a string, than that string gets coded and printed, than decoded

相关标签:
2条回答
  • 2021-01-26 11:53

    You need to understand the structure of memory, how the string is stored.

    The teacher's code is missing any comments, so it was either your task to figure it out (and you failed), or I will not comment any further about your teacher due to diplomacy reasons.

    The structure of string buffer is the one used by MS-DOS for function 0Ah of int 21h (description):

    MyString:
        db string_maximum_size     ; maximum characters to store into buffer
        db character_actually_read ; characters read by INT 21h: 0Ah function
        db string_maximum_size DUP (0)  ; the string characters
    

    So by entering string "hello" the memory at address MyString will be set to:
    33, 5, 104 ('h'), 101 ('e'), 108 ('l'), 108 ('l'), 111 ('o') followed by 26 zeroes (result of DUP (0)).

    Actually I think your code has bugs, setting up maximum size as total buffer size BUFF_LENGTH EQU STR_LENGTH + 3, while from the interrupt description I would expect the first byte to contain only STR_LENGTH. You may verify this by trying to input 33 characters long string, and check in debugger if the memory is overwritten after the MyString buffer. Also the +3 doesn't make much sense, as only +2 bytes are used for max size, and actual size.

    Now in code happens this:

    LEA bx,[MyString]   ; bx = address of first byte of buffer (contains maximum size)
    INC bx              ; bx now points to actual size
    ; instead LEA bx,[MyString+1] could have been used, skipping one INC bx
    MOV cl,[bx]         ; cl = actual string size
    XOR ch,ch           ; ch = 0 (extending 8 bit value in cl to unsigned 16 bit in cx)
    ; other option on 386+ CPU is MOVZX cx,BYTE PTR [bx]
    ; or XOR cx,cx  MOV cl,[bx]
    INC bx              ; bx now points to the first character
    

    It keeps then doing with [bx] content whatever it wish, incrementing bx again during loop to access next character, till the cx counter does reach 0.


    You should definitely start up the debugger, step trough that code instruction by instruction, and point memory window to MyString and watch how bx is used to access particular bytes there, and how those INC bx fits that.

    This will explain it even better than anything else.


    edit:

    One more thing. I actually kept one secret to myself, which is integral part of your question.

    So "How did I know?": you should always recall, that computers are computational machines. You put some program in (list of instructions), you put some numbers in, let it execute the instructions, and get the resulting numbers out.

    I had the code (instructions). Next thing I was looking for in your code was "how do you define the string". I found it's entered by user, read by int 21h function. So I googled the function, how it works, what data it returns. snap: suddenly all made sense (except max size bug, which I decided is simply a bug from your lector, it's easy to do some bug in ASM even for seasoned programmers).

    So always make sure you understand all instructions, and you understand well what are the input data (their structure and values). Then you can run everything in your head, just like on the CPU, to find out how those input data turns into output data. It's a purely deterministic computational process, you do not need to guess anything, it's exactly defined what happens next in every stage of the computation.

    If you know exactly what are those definitions, it's actually straightforwardly easy, easier than any high level abstraction stuff, just lot more tedious.

    When you are new to ASM, it's much easier to watch this happening in debugger (and it will also help you to understand ASM much faster), than doing it in your head.

    0 讨论(0)
  • 2021-01-26 12:07

    Explained:

    INC bx   ; increment bx, skip this byte (why ?)
    MOV cl, [bx]  ; get number of characters of the string
    XOR ch, ch    ; quick way to set ch to zero, so cx == cl for the loop
    
    coding:
        INC bx    ; next address
        MOV dl, [bx]  ; get character value
        XOR dl, ah    ; decode it with XOR key in ah
        MOV [bx], dl  ; store in the same memory value
    LOOP coding       ; decrement cl and goto coding if cx > 0
    

    Format of the string seems "custom", certainly not NULL terminated but rather containing size first (is it Pascal? Ada uses this kind of system)

    • The first byte seems ignored there.
    • The second byte contains the length of the following string
    • The rest of the data is the string itself

    Note that in that case encoding and decoding are the same since XOR masking is used.

    0 讨论(0)
提交回复
热议问题