I\'m writing some code for a very limited system where the mod operator is very slow. In my code a modulo needs to be used about 180 times per second and I figured that removing
Actually division by constants is a well known optimization for compilers and in fact, gcc is already doing it.
This simple code snippet:
int mod(int val) {
return val % 10;
}
Generates the following code on my rather old gcc with -O3:
_mod:
push ebp
mov edx, 1717986919
mov ebp, esp
mov ecx, DWORD PTR [ebp+8]
pop ebp
mov eax, ecx
imul edx
mov eax, ecx
sar eax, 31
sar edx, 2
sub edx, eax
lea eax, [edx+edx*4]
mov edx, ecx
add eax, eax
sub edx, eax
mov eax, edx
ret
If you disregard the function epilogue/prologue, basically two muls (indeed on x86 we're lucky and can use lea for one) and some shifts and adds/subs. I know that I already explained the theory behind this optimization somewhere, so I'll see if I can find that post before explaining it yet again.
Now on modern CPUs that's certainly faster than accessing memory (even if you hit the cache), but whether it's faster for your obviously a bit more ancient CPU is a question that can only be answered with benchmarking (and also make sure your compiler is doing that optimization, otherwise you can always just "steal" the gcc version here ;) ). Especially considering that it depends on an efficient mulhs (ie higher bits of a multiply instruction) to be efficient. Note that this code is not size independent - to be exact the magic number changes (and maybe also parts of the add/shifts), but that can be adapted.