Finding first and last capital letter in user input

走远了吗. 提交于 2019-12-24 03:47:06

问题


Input is to be taken from a-z or A-Z and the input is ended by an asterisk *.

We need to have the first and last Capital letters of the input characters as the output. Also, we should show the input we have taken each time. N.B. We take the inputs character by character, not as a string.

Test case 1: input: aAbCcP* output: AP

Test case 2: input: ZabCBc* output: ZB

I have written this code below, which satisfies Test Case 1, but not 2:

.MODEL
.STACK 100H
.DATA
   STR DB 'Enter letters:$'
.CODE

MAIN PROC

MOV AX, @DATA
MOV DS, AX

LEA DX, STR
MOV AH, 9
INT 21H 

cycle: 

    MOV AH, 1
    INT 21H

    CMP AL, '*'
    JZ output 
    CMP AL, 'Z' 
    JA save


head: 
    CMP BL, 1
    JZ save

    MOV BL, 1
    MOV BH, AL 

clear:
    XOR AL, AL  

save:
    MOV CH, AL

LOOP cycle 

output:
    MOV AH, 2
    MOV DL, BH
    INT 21H 

    MOV AH, 2
    MOV DL, CH
    INT 21H 


MAIN ENDP 
END MAIN 

回答1:


First ask yourself these questions:

  • What are capitals?
    If we don't consider accented characters, then capitals are characters with ASCII codes ranging from 65 to 90.

  • Can I trust the user to only input characters from a-z or A-Z?
    No you can't. You don't have control over what the user does at the keyboard, and that's why your program should take a defensive approach and test for capitals with something better than a single cmp al, 'Z'.

  • What will be the result if the input didn't contain a single capital?
    You could choose to print two spaces, or a descriptive message, or like I did display nothing at all.

  • What will be the result if the input contains only one capital?
    You could choose to print that one capital, or like I did display it twice because if you think of it, that single capital is at the same time the first occurence of a capital and also the last occurence of a capital.

  • What input/output functions will I use?
    For single character input you have a choice between DOS functions 01h, 06h, 07h, 08h, 0Ch, and 3Fh.
    For single character output you have a choice between DOS functions 02h, 06h, and 40h.
    If you're new to assembly then stick with the simpler ones and use functions 01h and 02h. Do consult the API reference before using any DOS function. And of course check with emu8086 whether it supports the function altogether!

You need to decide about all of the above in order to tackle the task. What is important, is that for every choice you make, you can defend your choice.


Below is my version of this task. For simplicity I'm using the tiny program model. See the ORG 256 directive on top? This program model has the major benefit of having all the segment registers pointing equally to your program (CS = DS = ES = SS).

The program runs 2 loops. The first loop runs until a capital is received. (Goes without saying that it stops earlier if the input contains an asterisk.) Because that capital is at the same time the first occurence of a capital and also the last occurence of a capital, I save it twice, both in DL and DH.

The second loop runs until an asterisk is received. Each time that a new capital comes along, it replaces what is written in DH. When this loop finally ends, both DL and DH are displayed on screen and in this order of course.

The program exits with the preferred DOS function 4Ch to terminate a progam.

I've written some essential comments, refrained from adding redundant ones, and used descriptive names for the labels in the program. Do note that nice tabular layout. For readability it's crux.

        ORG     256

Loop1:  mov     ah, 01h     ; DOS.GetKeyboardCharacter
        int     21h         ; -> AL
        cmp     al, "*"     ; Found end of input marker ?
        je      Done
        cmp     al, "A"
        jb      Loop1
        cmp     al, "Z"
        ja      Loop1
        mov     dl, al      ; For now it's the first
        mov     dh, al      ; AND the last capital

Loop2:  mov     ah, 01h     ; DOS.GetKeyboardCharacter
        int     21h         ; -> AL
        cmp     al, "*"     ; Found end of input marker ?
        je      Show
        cmp     al, "A"
        jb      Loop2
        cmp     al, "Z"
        ja      Loop2
        mov     dh, al      ; This is the latest capital
        jmp     Loop2

Show:   mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; -> (AL)
        mov     dl, dh
        mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; -> (AL)

Done:   mov     ax, 4C00h   ; DOS.TerminateWithReturnCode
        int     21h

Example:

aZeRTy*

aZeRTy*ZT


It would be very disappointing if you took it the easy way and just copy/pasted my code. I've tried to explain it in great detail and hope that you learn a lot from it.

My solution is certainly not the only good solution for this task. You could e.g. first input all of the characters and store them in memory somewhere, after which you process these characters from memory similar to how I did it.
Please try to write a working version that does it in this alternative way.You can only get smarter! Happy programming.




回答2:


Your code is broken because you always fall through to save: MOV CH, AL every iteration, so it can only work if the last capital is also the very last character of the whole input.

Single-step it with a debugger for a simple input like ABc* to see how it goes wrong.

Also, you use loop, which is like dec cx/jnz. That makes no sense because there's no counter-based termination condition, and could potentially corrupt CH if CL was zero. You don't even initialize CX first! The loop instruction is not the only way to loop; it's just a code-size peephole optimization you can use when it's convenient to use CX as a loop counter. Otherwise don't use it.


This is a simplified version of Sep's implementation, taking advantage of the fact that the input is guaranteed to be alphabetic, so we really can check for upper case as easily as c <= 'Z' (after ruling out the '*' terminator). We don't have to worry about inputs like 12ABcd7_ or spaces or newlines, which also have lower ASCII codes than the upper-case alphabetic range. Your cmp al,'Z' / ja check was correct, it's just the code you were branching to that didn't have sane logic.

Even if you did want to strictly check c >= 'A' && c <= 'Z', that range check can be done with one branch using sub al,'A' ; cmp al,'Z'-'A' ; ja non_upper instead of a pair of cmp/jcc branches. (That modifies the original, but if you save it in SI or something you could later restore it with lea ax, [si+'A'])

You can also put a conditional branch at the bottom of the loop for both loops, instead of a jmp at the bottom and an if() break inside. Sep's code already did that for the first loop.

I agree with Sep that having 2 loops is easier than checking a flag every time you find a capital (to see if it's the first capital or not).

        ORG     100h        ; DOS .com is loaded with IP=100h, with CS=DS=ES=SS
                            ; we don't actually do any absolute addressing so no real effect.

        mov     ah, 01h     ; DOS.GetKeyboardCharacter
                            ; AH=01 / int 21h doesn't modify AH so we only need this once
find_first_cap:  
        int     21h         ; stdin -> AL
        cmp     al, '*'     ; Found end of input marker ?
        je      Done        ;  if (c=='*') return;  without print anything, we haven't found a capital yet

        cmp     al, 'Z'
        ja      find_first_cap
    ; fall through: AL <= 'Z' and we can assume it's a capital letter, not a digit or something.

        mov     dl, al      ; For now it's the first
        ;mov     dh, al      ; AND the last capital

        ;mov     ah, 01h     ; DOS.GetKeyboardCharacter   AH still = 01
        ;jmp     loop2_entry      ; we can let the first iteration set DH
Loop2:                      ; do {
        cmp     al, 'Z'       ; assume all c <= 'Z' is a capital alphabetic character
        ja      loop2_entry
        mov     dh, al        ; This is the latest capital

loop2_entry:
        int     21h         ; stdin -> AL
        cmp     al, '*'
        jne     Loop2       ; }while(c != '*');


Show:   mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; AL -> stdout
        mov     dl, dh
        ; mov     ah, 02h     ; DOS.DisplayCharacter
        int     21h         ; AL -> stdout

Done:   mov     ax, 4C00h   ; DOS.TerminateWithReturnCode
        int     21h

At this point it's arguably not simpler, but is more optimized especially for code-size. That tends to happen when I write anything because that's the fun part. :P

Having a taken branch inside the loop for the non-capital case is arguably worse for performance. (In modern code for a P6-compatible CPU you'd probably use cmovbe esi, eax instead of a conditional branch, because a conditional move is exactly what you want.)

Omitting the mov ah, XX before an int 21h because it's still set doesn't make your program more human-readable, but it is safe if you're careful to check the docs for each call to make sure they don't return anything in AH.



来源:https://stackoverflow.com/questions/56819605/finding-first-and-last-capital-letter-in-user-input

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!