X86 NASM Assembly converting lower to upper and upper to lowercase characters

后端 未结 5 1447
失恋的感觉
失恋的感觉 2020-12-20 06:02

As i am pretty new to assembly, i have a few questions in regards to how i should convert from a lowercase to an uppercase if the user enters an uppercase letter or vice ver

相关标签:
5条回答
  • 2020-12-20 06:02

    Jeff Duntemann wrote a book called Assembly Language Step by Step programming with linux .. which covers this topic very well on page 275 - 277.

    there he shows by using the code sub byte [ebp+ecx], 20h you can then change lower-case to upper-case , please note that the buffer is using 1024 bytes which is a faster and better way to do this then the previous example located on page 268-269 where the buffer only has 8 bits at a time.

    0 讨论(0)
  • 2020-12-20 06:03

    Okay, but your string is not in edx, it's in [ecx] (or [In_Buffer]) (and it's only one useful character). To get a single character...

    mov al, [ecx]
    

    In a HLL you do "if some condition, execute this code". You might wonder how the CPU knows whether to execute the code or not. What we really do (HLLs do this for you) is "if NOT condition, skip over this code" (to a label). Experiment with it, you'll figure it out.

    Exit cleanly, whatever path your code takes. You don't show this, but I assume you do it.

    I just posted some info on sys_read here.

    It's for a completely different program (adding two numbers - "hex" numbers) but the part about sys_read might interest you...

    0 讨论(0)
  • 2020-12-20 06:17

    Here is a NASM program I hacked together that flips the case of a string, you basically need to loop over the string and check each character for boundaries in ascii and then add or subtract 0x20 to change the case (that is the distance between upper and lower in ascii). You can use the Linux ascii command to see a table of ascii values.

    File: flipcase.asm

    section     .text
    global      _start                 ; Entry point for linker (ld)
    
      ; Linker entry point                                
    _start:                                                         
        mov     rcx,len                ; Place length of message into rcx
        mov     rbp,msg                ; Place address of our msg into rbp    
        dec     rbp                    ; Adjust count to offset
    
      ; Go through the buffer and convert lowercase to uppercase characters:
    upperScan:
        cmp byte [rbp+rcx],0x41        ; Test input char against uppercase 'A'                 
        jb lowerScan                   ; Not uppercase Ascii < 0x41 ('A') - jump below
        cmp byte [rbp+rcx],0x5A        ; Test input char against uppercase 'Z' 
        ja lowerScan                   ; Not uppercase Ascii > 0x5A ('Z') - jump above  
         ; At this point, we have a uppercase character
        add byte [rbp+rcx],0x20        ; Add 0x20 to get the lowercase Ascii value
        jmp Next                       ; Done, jump to next
    
    lowerScan:
        cmp byte [rbp+rcx],0x61        ; Test input char against lowercase                 
        jb Next                        ; Not lowercase Ascii < 0x61 ('a') - jump below
        cmp byte [rbp+rcx],0x7A        ; Test input char against lowercase 'z'
        ja Next                        ; Not lowercase Ascii > 0x7A ('z') - jump below  
         ; At this point, we have a lowercase char
        sub byte [rbp+rcx],0x20        ; Subtract 0x20 to get the uppercase Ascii value
         ; Fall through to next        
    
    Next:   
        dec rcx                        ; Decrement counter
        jnz upperScan                  ; If characters remain, loop back
    
      ; Write the buffer full of processed text to stdout:
    Write:        
        mov     rbx,1                  ; File descriptor 1 (stdout)    
        mov     rax,4                  ; System call number (sys_write)
        mov     rcx,msg                ; Message to write        
        mov     rdx,len                ; Length of message to write
        int     0x80                   ; Call kernel interrupt
        mov     rax,1                  ; System call number (sys_exit)
        int     0x80                   ; Call kernel
    
    section     .data
    
    msg     db  'hELLO, wwwoRLD!',0xa  ; Our dear string
    len     equ $ - msg                ; Length of our dear string
    

    Then you can compile and run it with:
    $> nasm -felf64 flipcase.asm && ld -melf_x86_64 -o flipcase flipcase.o && ./flipcase

    0 讨论(0)
  • 2020-12-20 06:24

    Cute trick: if they type only letters, you can XOR their input letters with 0x20 to swap their case.

    Then, if they can type more than letters, you just have to check each letter to see if it is alphabetical before XORing it. You can do that with a test to see if it lies in the ranges 'a' to 'z' or 'A' to 'Z', for example.

    Alternately, you can just map each letter through a 256-element table which maps the characters the way you want them (this is usually how functions like toupper are implemented, for example).

    0 讨论(0)
  • 2020-12-20 06:29

    If you only support ASCII, then you can force lowercase using an OR 0x20

      or   eax, 0x20
    

    Similarly, you can transform a letter to uppercase by clearing that bit:

      and  eax, 0xBF   ; or use ~0x20
    

    And as nneonneo mentioned, the character case can be swapped using the XOR instruction:

      xor  eax, 0x20
    

    That only works if eax is between 'a' and 'z' or 'A' and 'Z', so you'd have to compare and make sure you are in the range:

      cmp  eax, 'a'
      jl   .not-lower
      cmp  eax, 'z'
      jg   .not-lower
      or   eax, 0x20
    .not-lower:
    

    I used nasm syntax. You may want to make sure the jl and jg are correct too...

    If you need to transform any international character, then that's a lot more complicated unless you can call a libc tolower() or toupper() function that accept Unicode characters.


    As a fair question: why would it work? (asked by kuhaku)

    ASCII characters (also ISO-8859-1) have the basic uppercase characters defined between 0x41 and 0x5A and the lowercase characters between 0x61 and 0x7A.

    To force 4 into 6 and 5 into 7, you force bit 5 (0x20) to be set.

    To go to uppercase, you do the opposite, you remove bit 5 so it becomes zero.

    0 讨论(0)
提交回复
热议问题