问题
So, I have a block of code which sets the bounders to check if a character is a letter (not numbers, not symbols), but I don't think it works for the characters in between upper and lower case. Can you help? Thanks!
mov al, byte ptr[esi + ecx]; move the first character to al
cmp al, 0 ; compare al with null which is the end of string
je done ; if yes, jump to done
cmp al, 0x41 ; compare al with "A" (upper bounder)
jl next_char ; jump to next character if less
cmp al, 0x7A ; compare al with "z" (lower bounder)
jg next_char ; jump to next character if greater
//do something if it's a letter
next_char:
//do something different
回答1:
You need to have a logic that combines multiple conditions similar to what would be a "C" statement: if((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'))
You can do that like this:
...
je done ; if yes, jump to done
cmp al, 0x41 ; compare al with "A"
jl next_char ; jump to next character if less
cmp al, 0x5A ; compare al with "Z"
jle found_letter ; if al is >= "A" && <= "Z" -> found a letter
cmp al, 0x61 ; compare al with "a"
jl next_char ; jump to next character if less (since it's between "Z" & "a")
cmp al, 0x7A ; compare al with "z"
jg next_char ; above "Z" -> not a character
found_letter:
// ...
next_char:
// ...
回答2:
You may or 0x20 to each character; this will make upper-case letters lower-case (and replace non-letter characters by other non-letter characters):
...
je done ; This is your existing code
or al, 0x20 ; <-- This line is new!
cmp al, 0x41 ; This is your existing code again
...
Note: If your code should work with letters above 0x7F (like "Ä", "Ó", "Ñ") it would become very complex. One problem in this case would be that the ASCII code of these characters is different in Windows console programs (Example: "Ä" = 0x8E) and Windows GUI programs ("Ä" = 0xC4) and may be even different in other operating systems...
回答3:
Correct, there's a gap of a few non-alphabetic characters between 'Z'
and 'a'
.
The most efficient way is to set the lower-case bit with an OR, then use the range-check trick of sub + unsigned compare. This of course only works for ASCII, not extended character sets where there are other ranges of alphabetic characters. Note that or al, 0x20
can never create a lower-case character if the original wasn't an upper-case character, because the ranges are "aligned" the same relative to a mod 32 boundary of ASCII codes.
Arrange your loop structure with the conditional branch at the bottom. Either enter the loop with a jmp
to that load and test, or peel that part of the first iteration. (Why are loops always compiled into "do...while" style (tail jump)?)
Use movzx
loads to avoid a false dependency on merging a low byte into EAX when writing AL.
; ESI = pointer to the string
xor ecx, ecx ; index = 0
movzx eax, byte ptr[esi] ; test first character
test eax, eax
jz .done ; skip the loop on empty string
; alternative: jmp .next_char to enter the loop
.loop: ; do{
inc ecx
mov edx, eax ; save a copy of the original if needed
;;;; THESE 4 INSTRUCTIONS ARE THE ALPHA / NON-ALPHA TEST
or al, 0x20 ; force lowercase
sub al, 'a' ; AL = 0..25 if alphabetic
cmp al, 'z'-'a'
ja .non_alphabetic ; unsigned compare rejects too high or too low (wrapping)
;; do something if it's a letter
jmp .next_char
.non_alphabetic:
;; do something different, then fall through
.next_char:
movzx eax, byte ptr[esi + ecx]
test eax, eax
jnz .loop ; }while((AL = str[i]) != 0);
.done:
If the input is before 'a', sub al, 'a'
will be signed negative, or as unsigned will wrap to a high value, so cmp al, 'z'-'a'
/ ja
will reject it.
If the input is after 'z'
, sub al, 'a'
will leave a value higher than 25 ('z'-'a'
), so the unsigned compare will reject it also.
Compilers use this unsigned compare trick when compiling a C expression like c <= 'z' && c >= 'a'
, so you can be sure it works the same as that expression for every possible input.
Other style notes: normally you'd just increment ESI, instead of having both a pointer and an index. Also, you may not need mov edx, eax
if you can use the AL value (index into the alphabet). Making a copy and using this "destructive" test is usually better than 2 separate branches.
NASM syntax allows character constants like C, so you can write as
'A', or
0x7Aas
'z'. e.g.
cmp al, 'a'`. Then you don't even need to comment the line.
Writing it that way (with the next_char
label at the top of the loop) saves a jmp
at the bottom. Fewer instructions in the loop = better. The only point of writing asm these days is performance, so it makes sense to learn good techniques like this from the start, if it's not too confusing. No assembly answer would be complete without a link to http://agner.org/optimize/.
output of ascii(1), or http://www.asciitable.com/
Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex
0 00 NUL 16 10 DLE 32 20 48 30 0 64 40 @ 80 50 P 96 60 ` 112 70 p
1 01 SOH 17 11 DC1 33 21 ! 49 31 1 65 41 A 81 51 Q 97 61 a 113 71 q
2 02 STX 18 12 DC2 34 22 " 50 32 2 66 42 B 82 52 R 98 62 b 114 72 r
3 03 ETX 19 13 DC3 35 23 # 51 33 3 67 43 C 83 53 S 99 63 c 115 73 s
4 04 EOT 20 14 DC4 36 24 $ 52 34 4 68 44 D 84 54 T 100 64 d 116 74 t
5 05 ENQ 21 15 NAK 37 25 % 53 35 5 69 45 E 85 55 U 101 65 e 117 75 u
6 06 ACK 22 16 SYN 38 26 & 54 36 6 70 46 F 86 56 V 102 66 f 118 76 v
7 07 BEL 23 17 ETB 39 27 ' 55 37 7 71 47 G 87 57 W 103 67 g 119 77 w
8 08 BS 24 18 CAN 40 28 ( 56 38 8 72 48 H 88 58 X 104 68 h 120 78 x
9 09 HT 25 19 EM 41 29 ) 57 39 9 73 49 I 89 59 Y 105 69 i 121 79 y
10 0A LF 26 1A SUB 42 2A * 58 3A : 74 4A J 90 5A Z 106 6A j 122 7A z
11 0B VT 27 1B ESC 43 2B + 59 3B ; 75 4B K 91 5B [ 107 6B k 123 7B {
12 0C FF 28 1C FS 44 2C , 60 3C < 76 4C L 92 5C \ 108 6C l 124 7C |
13 0D CR 29 1D GS 45 2D - 61 3D = 77 4D M 93 5D ] 109 6D m 125 7D }
14 0E SO 30 1E RS 46 2E . 62 3E > 78 4E N 94 5E ^ 110 6E n 126 7E ~
15 0F SI 31 1F US 47 2F / 63 3F ? 79 4F O 95 5F _ 111 6F o 127 7F DEL
回答4:
This function takes a string, and uses ascii table values to determine if it is an upper case char or lower case char. The CMP-->BLS and CMP-->BLI instructions are what determine if it's an upper or lower case char. The code that comes afterwards capitalizes the char if it is a lower case char.
__asm void my_capitalize(char *str)
{
cap_loop
LDRB r1, [r0] ; Load byte into r1 from memory pointed to by r0 (str pointer)
CMP r1, #'a'-1 ; compare it with the character before 'a'
BLS cap_skip ; If byte is lower or same, then skip this byte
CMP r1, #'z' ; Compare it with the 'z' character
BHI cap_skip ; If it is higher, then skip this byte
SUBS r1,#32 ; Else subtract out difference to capitalize it
STRB r1, [r0] ; Store the capitalized byte back in memory
cap_skip
ADDS r0, r0, #1 ; Increment str pointer
CMP r1, #0 ; Was the byte 0?
BNE cap_loop ; If not, repeat the loop
BX lr ; Else return from subroutine
}
来源:https://stackoverflow.com/questions/31824441/how-can-i-check-if-a-character-is-a-letter-in-assembly