I have a simple question for a Comp Sci class I\'m taking where my task is to convert a function into MIPS assembly language. I believe I have a correct answer but I want to ver
Yes, looks correct to me, and fairly efficient. Implementing a while
loop with asm structured like a do{}while()
is the standard and best way to loop in asm. Why are loops always compiled into "do...while" style (tail jump)?
A more direct transliteration of the C would check *s
before incrementing len
.
e.g. by peeling the first iteration and turning it into a load/branch that can skip the whole loop for an empty string. (And reordering the loop body, which would probably put the load close to the branch, worse for performance because of load latency.)
You could optimize away the len--
overshoot-correction after the loop: start with len=-1
instead of 0
. Use li $v0, -1
which can still be implemented with a single instruction:
addiu $v0, $zero, -1
A further step of optimization is to only do the pointer increment inside the loop, and find the length at the end with len = end - start
.
We can correct for the off-by-one (to not count the terminator) by offsetting the incoming pointer while we're copying it to another reg.
# char *s input in $a0, size_t length returned in $v0
strlen:
addiu $v0, $a0, 1 # char *start_1 = start + 1
loop: # do{
lbu $t0, ($a0) # char tmp0 = load *s
addiu $a0, $a0, 1 # s++
bne $t0, $zero, loop # }while(tmp0 != '\0')
s_end:
subu $v0, $a0, $v0 # size_t len = s - start
jr $ra
I used addiu
/ subu
because I don't want it to fault on signed-overflow of a pointer. Your version should probably use addiu
as well so it works for strings up to 4GB, not just 2.
Untested, but we can think through the correctness:
s
points at a 0): when we reach the final subtract, we have v0=s+1
(from before the loop) and a0=s+1
(from the first/only iteration which falls through because it loads $t0 = 0
). Subtracting these gives len=0
= strlen("")
len = (s+2) - (s+1) = 1
.For MIPS with a branch-delay slot, the addiu and subu can be reordered after bne and jr respectively, filling those branch-delay slots. (But then bne
is right after the load so classic MIPS would have to stall, or even fill the load-delay slot with a nop on a MIPS I without interlocks for loads).
Of course if you actually care about real-world strlen
performance for small to medium strings (not just tiny), like more than 8 or 16 bytes, use a bithack that checks whole words at once for maybe having a 0
byte.
Why does glibc's strlen need to be so complicated to run quickly?