Is there any easy way to do modulus of 2^32 - 1 operation?

问题

I just heard about that x mod (2^32-1) and x / (2^32-1) would be easy, but how?

to calculate the formula:

x_n = (x_n-1 + x_n-1 / b)mod b.

For b = 2^32, its easy, x%(2^32) == x & (2^32-1); and x / (2^32) == x >> 32. (the ^ here is not XOR). How to do that when b = 2^32 - 1.

In the page https://en.wikipedia.org/wiki/Multiply-with-carry. They say "arithmetic for modulus 2^32 − 1 requires only a simple adjustment from that for 2^32". So what is the "simple adjustment"?

回答1:

(This answer only handles the mod case.)

I'll assume that the datatype of x is more than 32 bits (this answer will actually work with any positive integer) and that it is positive (the negative case is just -(-x mod 2^32-1)), since if it at most 32 bits, the question can be answered by

x mod (2^32-1) = 0 if x == 2^32-1, x otherwise
x / (2^32 - 1) = 1 if x == 2^32-1, 0 otherwise

We can write x in base 2^32, with digits x0, x1, ..., xn. So

  x = x0 + 2^32 * x1 + (2^32)^2 * x2 + ... + (2^32)^n * xn

This makes the answer clearer when we do the modulus, since 2^32 == 1 mod 2^32-1. That is

  x == x0 + 1 * x1 + 1^2 * x2 + ... + 1^n * xn (mod 2^32-1)
    == x0 + x1 + ... + xn (mod 2^32-1)

x mod 2^32-1 is the same as the sum of the base 2^32 digits! (we can't drop the mod 2^32-1 yet). We have two cases now, either the sum is between 0 and 2^32-1 or it is greater. In the former, we are done; in the later, we can just recur until we get between 0 and 2^32-1. Getting the digits in base 2^32 is fast, since we can use bitwise operations. In Python (this doesn't handle negative numbers):

def mod_2to32sub1(x):
    s = 0 # the sum

    while x > 0: # get the digits
        s += x & (2**32-1)
        x >>= 32

    if s > 2**32-1:
        return mod_2to32sub1(s)
    elif s == 2**32-1:
        return 0
    else:
        return s

(This is extremely easy to generalise to x mod 2^n-1, in fact you just replace any occurance of 32 with n in this answer.)

(EDIT: added the elif clause to avoid an infinite loop on mod_2to32sub1(2**32-1). EDIT2: replaced ^ with **... oops.)

回答2:

So you compute with the "rule" 2³² = 1. In general, 2^32+x = 2^x. You can simplify 2^a by taking the exponent modulo 32. Example: 2⁶⁶ = 2².

You can express any number in binary, and then lower the exponents. Example: the number 2⁴⁰ + 2³⁸ + 2²⁰ + 2 + 1 can be simplified to 2⁸ + 2⁶ + 2²⁰ + 2 + 1.

In general, you can group the exponents every 32 powers of 2, and "downgrade" all exponents modulo 32.

For 64 bit words, the number can be expressed as

2³² A + B

where 0 <= A,B <= 2³²-1. Getting A and B is easy with bitwise operations.

So you can simplify that to A + B, which is much smaller: at most 2³³. Then, check if this number is at least 2³²-1, and subtract 2³² - 1 in that case.

This avoids expensive direct division.

回答3:

The modulus has already been explained, nevertheless, let's recapitulate.

To find the remainder of k modulo 2^n-1, write

k = a + 2^n*b,  0 <= a < 2^n

Then

k = a + ((2^n-1) + 1) * b
  = (a + b) + (2^n-1)*b
  ≡ (a + b) (mod 2^n-1)

If a + b >= 2^n, repeat until the remainder is less than 2^n, and if that leads you to a + b = 2^n-1, replace that with 0. Each "shift right by n and add to the last n bits" moves the first set bit right by n or n-1 places (unless k < 2^(2*n-1), when the first set bit after the shift-and-add may be the 2^n bit). So if the width of the type is large compared to n, this will need many shifts - consider a 128-bit type and n = 3, for large k you will need over 40 shifts. To reduce the number of shifts required, you can exploit the fact that

2^(m*n) - 1 = (2^n - 1) * (2^((m-1)*n) + 2^((m-2)*n) + ... + 2^(2*n) + 2^n + 1),

of which we will only use that 2^n - 1 divides 2^(m*n) - 1 for all m > 0. Then you shift by multiples of n that are roughly half the maximal bit-length the value can have at that step. For the above example of a 128-bit type and the remainder modulo 7 (2^3 - 1), the closest multiples of 3 to 128/2 are 63 and 66, first shift by 63 bits

r_1 = (k & (2^63 - 1)) + (k >> 63) // r_1 < 2^63 + 2^(128-63) < 2^66

to get a number with at most 66 bits, then shift by 66/2 = 33 bits

r_2 = (r_1 & (2^33 - 1)) + (r_1 >> 33) // r_2 < 2^33 + 2^(66-33) = 2^34

to reach at most 34 bits. Next shift by 18 bits, then 9, 6, 3

r_3 = (r_2 & (2^18 - 1)) + (r_2 >> 18) // r_3 < 2^18 + 2^(34-18) < 2^19
r_4 = (r_3 & (2^9 - 1)) + (r_3 >> 9)   // r_4 < 2^9 + 2^(19-9) < 2^11
r_5 = (r_4 & (2^6 - 1)) + (r_4 >> 6)   // r_5 < 2^6 + 2^(11-6) < 2^7
r_6 = (r_5 & (2^3 - 1)) + (r_5 >> 3)   // r_6 < 2^3 + 2^(7-3) < 2^5
r_7 = (r_6 & (2^3 - 1)) + (r_6 >> 3)   // r_7 < 2^3 + 2^(5-3) < 2^4

Now a single subtraction if r_7 >= 2^3 - 1 suffices. To calculate k % (2^n -1) in a b-bit type, O(log₂ (b/n)) shifts are needed.

The quotient is obtained similarly, again we write

k = a + 2^n*b,  0 <= a < 2^n
  = a + ((2^n-1) + 1)*b
  = (2^n-1)*b + (a+b),

so k/(2^n-1) = b + (a+b)/(2^n-1), and we continue while a+b > 2^n-1. Here we unfortunately cannot reduce the work by shifting and masking about half the width, so the method is only efficient when n is not much smaller than the width of the type.

Code for the fast cases where n is not too small:

unsigned long long modulus_2n1(unsigned n, unsigned long long k) {
    unsigned long long mask = (1ULL << n) - 1ULL;
    while(k > mask) {
        k = (k & mask) + (k >> n);
    }
    return k == mask ? 0 : k;
}

unsigned long long quotient_2n1(unsigned n, unsigned long long k) {
    unsigned long long mask = (1ULL << n) - 1ULL, quotient = 0;
    while(k > mask) {
        quotient += k >> n;
        k = (k & mask) + (k >> n);
    }
    return k == mask ? quotient + 1 : quotient;
}

For the special case where n is half the width of the type, the loop runs at most twice, so if branches are expensive, it may be better to unroll the loop and unconditionally execute the loop body twice.

回答4:

It is not. What must you have heard is x mod 2^n and x/2^n being easier. x/2^n can be performed as x>>n, and x mod 2^n, do x&(1<<n-1)

来源：https://stackoverflow.com/questions/9857080/is-there-any-easy-way-to-do-modulus-of-232-1-operation

标签

algorithm

modulus