I have a simple question for a Comp Sci class I\'m taking where my task is to convert a function into MIPS assembly language. I believe I have a correct answer but I want to ver
(Editor's note: the add of -1
after the loop corrects for off by 1 while still allowing an efficient do{}while loop structure. This answer proposes a more literal translation from C into an if() break
inside an unconditional loop.)
I think the while loop isn't right in the case of *s == 0
.
It should be something like this:
...
lbu $t0, 0($a0)
loop:
beq $t0, $zero, s_end # *
...
b loop
s_end:
...
*You could use a macro instruction (beqz $t0, s_end) instead of beq instruction.
Yes, looks correct to me, and fairly efficient. Implementing a while
loop with asm structured like a do{}while()
is the standard and best way to loop in asm. Why are loops always compiled into "do...while" style (tail jump)?
A more direct transliteration of the C would check *s
before incrementing len
.
e.g. by peeling the first iteration and turning it into a load/branch that can skip the whole loop for an empty string. (And reordering the loop body, which would probably put the load close to the branch, worse for performance because of load latency.)
You could optimize away the len--
overshoot-correction after the loop: start with len=-1
instead of 0
. Use li $v0, -1
which can still be implemented with a single instruction:
addiu $v0, $zero, -1
A further step of optimization is to only do the pointer increment inside the loop, and find the length at the end with len = end - start
.
We can correct for the off-by-one (to not count the terminator) by offsetting the incoming pointer while we're copying it to another reg.
# char *s input in $a0, size_t length returned in $v0
strlen:
addiu $v0, $a0, 1 # char *start_1 = start + 1
loop: # do{
lbu $t0, ($a0) # char tmp0 = load *s
addiu $a0, $a0, 1 # s++
bne $t0, $zero, loop # }while(tmp0 != '\0')
s_end:
subu $v0, $a0, $v0 # size_t len = s - start
jr $ra
I used addiu
/ subu
because I don't want it to fault on signed-overflow of a pointer. Your version should probably use addiu
as well so it works for strings up to 4GB, not just 2.
Untested, but we can think through the correctness:
s
points at a 0): when we reach the final subtract, we have v0=s+1
(from before the loop) and a0=s+1
(from the first/only iteration which falls through because it loads $t0 = 0
). Subtracting these gives len=0
= strlen("")
len = (s+2) - (s+1) = 1
.For MIPS with a branch-delay slot, the addiu and subu can be reordered after bne and jr respectively, filling those branch-delay slots. (But then bne
is right after the load so classic MIPS would have to stall, or even fill the load-delay slot with a nop on a MIPS I without interlocks for loads).
Of course if you actually care about real-world strlen
performance for small to medium strings (not just tiny), like more than 8 or 16 bytes, use a bithack that checks whole words at once for maybe having a 0
byte.
Why does glibc's strlen need to be so complicated to run quickly?
Yeah, you have a correct asm version, and I like the fact that you do as much work as possible before testing the value of t0 to give as much time as possible for loading from memory.