Is there any alternative to using % (modulus) in C/C++?

问题

I read somewhere once that the modulus operator is inefficient on small embedded devices like 8 bit micro-controllers that do not have integer division instruction. Perhaps someone can confirm this but I thought the difference is 5-10 time slower than with an integer division operation.

Is there another way to do this other than keeping a counter variable and manually overflowing to 0 at the mod point?

const int FIZZ = 6;
for(int x = 0; x < MAXCOUNT; x++)
{
    if(!(x % FIZZ)) print("Fizz\n"); // slow on some systems
}

vs:

The way I am currently doing it:

const int FIZZ = 6;
int fizzcount = 1;
for(int x = 1; x < MAXCOUNT; x++)
{
    if(fizzcount >= FIZZ) 
    {
        print("Fizz\n");
        fizzcount = 0;
    }
}

回答1:

Ah, the joys of bitwise arithmetic. A side effect of many division routines is the modulus - so in few cases should division actually be faster than modulus. I'm interested to see the source you got this information from. Processors with multipliers have interesting division routines using the multiplier, but you can get from division result to modulus with just another two steps (multiply and subtract) so it's still comparable. If the processor has a built in division routine you'll likely see it also provides the remainder.

Still, there is a small branch of number theory devoted to Modular Arithmetic which requires study if you really want to understand how to optimize a modulus operation. Modular arithmatic, for instance, is very handy for generating magic squares.

So, in that vein, here's a very low level look at the math of modulus for an example of x, which should show you how simple it can be compared to division:

Maybe a better way to think about the problem is in terms of number bases and modulo arithmetic. For example, your goal is to compute DOW mod 7 where DOW is the 16-bit representation of the day of the week. You can write this as:

 DOW = DOW_HI*256 + DOW_LO

 DOW%7 = (DOW_HI*256 + DOW_LO) % 7
       = ((DOW_HI*256)%7  + (DOW_LO % 7)) %7
       = ((DOW_HI%7 * 256%7)  + (DOW_LO%7)) %7
       = ((DOW_HI%7 * 4)  + (DOW_LO%7)) %7

Expressed in this manner, you can separately compute the modulo 7 result for the high and low bytes. Multiply the result for the high by 4 and add it to the low and then finally compute result modulo 7.

Computing the mod 7 result of an 8-bit number can be performed in a similar fashion. You can write an 8-bit number in octal like so:

  X = a*64 + b*8 + c

Where a, b, and c are 3-bit numbers.

  X%7 = ((a%7)*(64%7) + (b%7)*(8%7) + c%7) % 7
      = (a%7 + b%7 + c%7) % 7
      = (a + b + c) % 7

since 64%7 = 8%7 = 1

Of course, a, b, and c are

  c = X & 7
  b = (X>>3) & 7
  a = (X>>6) & 7  // (actually, a is only 2-bits).

The largest possible value for a+b+c is 7+7+3 = 17. So, you'll need one more octal step. The complete (untested) C version could be written like:

unsigned char Mod7Byte(unsigned char X)
{
    X = (X&7) + ((X>>3)&7) + (X>>6);
    X = (X&7) + (X>>3);

    return X==7 ? 0 : X;
}

I spent a few moments writing a PIC version. The actual implementation is slightly different than described above

Mod7Byte:
       movwf        temp1        ;
       andlw        7        ;W=c
       movwf        temp2        ;temp2=c
       rlncf   temp1,F        ;
       swapf        temp1,W ;W= a*8+b
       andlw   0x1F
       addwf        temp2,W ;W= a*8+b+c
       movwf        temp2   ;temp2 is now a 6-bit number
       andlw   0x38    ;get the high 3 bits == a'
       xorwf        temp2,F ;temp2 now has the 3 low bits == b'
       rlncf   WREG,F  ;shift the high bits right 4
       swapf   WREG,F  ;
       addwf        temp2,W ;W = a' + b'

 ; at this point, W is between 0 and 10


       addlw        -7
       bc      Mod7Byte_L2
Mod7Byte_L1:
       addlw        7
Mod7Byte_L2:
       return

Here's a liitle routine to test the algorithm

       clrf    x
       clrf    count

TestLoop:
       movf        x,W
       RCALL   Mod7Byte
       cpfseq count
        bra    fail

       incf        count,W
       xorlw   7
       skpz
        xorlw        7
       movwf   count

       incfsz        x,F
       bra        TestLoop
passed:

Finally, for the 16-bit result (which I have not tested), you could write:

uint16 Mod7Word(uint16 X)
{
 return Mod7Byte(Mod7Byte(X & 0xff) + Mod7Byte(X>>8)*4);
}

Scott

回答2:

If you are calculating a number mod some power of two, you can use the bit-wise and operator. Just subtract one from the second number. For example:

x % 8 == x & 7
x % 256 == x & 255

A few caveats:

This only works if the second number is a power of two.
It's only equivalent if the modulus is always positive. The C and C++ standards don't specify the sign of the modulus when the first number is negative (until C++11, which does guarantee it will be negative, which is what most compilers were already doing). A bit-wise and gets rid of the sign bit, so it will always be positive (i.e. it's a true modulus, not a remainder). It sounds like that's what you want anyway though.
Your compiler probably already does this when it can, so in most cases it's not worth doing it manually.

回答3:

There is an overhead most of the time in using modulo that are not powers of 2. This is regardless of the processor as (AFAIK) even processors with modulus operators are a few cycles slower for divide as opposed to mask operations.

For most cases this is not an optimisation that is worth considering, and certainly not worth calculating your own shortcut operation (especially if it still involves divide or multiply).

However, one rule of thumb is to select array sizes etc. to be powers of 2.

so if calculating day of week, may as well use %7 regardless if setting up a circular buffer of around 100 entries... why not make it 128. You can then write % 128 and most (all) compilers will make this & 0x7F

回答4:

Unless you really need high performance on multiple embedded platforms, don't change how you code for performance reasons until you profile!

Code that's written awkwardly to optimize for performance is hard to debug and hard to maintain. Write a test case, and profile it on your target. Once you know the actual cost of modulus, then decide if the alternate solution is worth coding.

回答5:

@Matthew is right. Try this:

int main() {
  int i;
  for(i = 0; i<=1024; i++) {
    if (!(i & 0xFF)) printf("& i = %d\n", i);
    if (!(i % 0x100)) printf("mod i = %d\n", i);
  }
}

回答6:

x%y == (x-(x/y)*y)

Hope this helps.

回答7:

In the embedded world, the "modulus" operations you need to do are often the ones that break down nicely into bit operations that you can do with '&' and '|' and sometimes '>>'.

回答8:

Do you have access to any programmable hardware on the embedded device? Like counters and such? If so, you might be able to write a hardware based mod unit, instead of using the simulated %. (I did that once in VHDL. Not sure if I still have the code though.)

Mind you, you did say that division was 5-10 times faster. Have you considered doing a division, multiplication, and subtraction to simulated the mod? (Edit: Misunderstood the original post. I did think it was odd that division was faster than mod, they are the same operation.)

In your specific case, though, you are checking for a mod of 6. 6 = 2*3. So you could MAYBE get some small gains if you first checked if the least significant bit was a 0. Something like:

if((!(x & 1)) && (x % 3))
{
    print("Fizz\n");
}

If you do that, though, I'd recommend confirming that you get any gains, yay for profilers. And doing some commenting. I'd feel bad for the next guy who has to look at the code otherwise.

回答9:

You should really check the embedded device you need. All the assembly language I have seen (x86, 68000) implement the modulus using a division.

Actually, the division assembly operation returns the result of the division and the remaining in two different registers.

回答10:

Not that this is necessarily better, but you could have an inner loop which always goes up to FIZZ, and an outer loop which repeats it all some certain number of times. You've then perhaps got to special case the final few steps if MAXCOUNT is not evenly divisible by FIZZ.

That said, I'd suggest doing some research and performance profiling on your intended platforms to get a clear idea of the performance constraints you're under. There may be much more productive places to spend your optimisation effort.

回答11:

@Jeff V: I see a problem with it! (Beyond that your original code was looking for a mod 6 and now you are essentially looking for a mod 8). You keep doing an extra +1! Hopefully your compiler optimizes that away, but why not just test start at 2 and go to MAXCOUNT inclusive? Finally, you are returning true every time that (x+1) is NOT divisible by 8. Is that what you want? (I assume it is, but just want to confirm.)

回答12:

For modulo 6 you can change the Python code to C/C++:

def mod6(number):
    while number > 7:
        number = (number >> 3 << 1) + (number & 0x7)
    if number > 5:
        number -= 6
    return number

回答13:

The print statement will take orders of magnitude longer than even the slowest implementation of the modulus operator. So basically the comment "slow on some systems" should be "slow on all systems".

Also, the two code snippets provided don't do the same thing. In the second one, the line

if(fizzcount >= FIZZ)

is always false so "FIZZ\n" is never printed.

来源：https://stackoverflow.com/questions/48053/is-there-any-alternative-to-using-modulus-in-c-c

标签

c++

modulo

embedded