I didn't really read your massive wall of uncommented code.
To reverse a buffer in-place, get pointers to the first and last characters, then:
Load the bytes into registers, then store the opposite registers back to the pointers.
Increment the start pointer si
, decrement the end pointer di
.
loop as long as start < end: cmp si, di / jb
Downcasing can be done on a single character, so you can do that on both bytes separately, when you have them in registers while you're swapping. Just check that it's between 'A'
and 'Z'
, then add 0x20. (You unfortunately can't just or al, 20H
unless you know that your character is already either a lower or uppercase letter, and not some other ASCII character).
Reversing to a new buffer is even easier. Just go forwards in one array and backwards in the other, for count
bytes.
If your target baseline CPU feature set included 386 instructions, you could have loaded 4B at a time and used bswap
to reverse bytes 4 at a time. Or with SSSE3, pshufb
to reverse 16B at a time.