Why do people say there is modulo bias when using a random number generator?

前端 未结 10 1196
一整个雨季
一整个雨季 2020-11-21 05:48

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly the

10条回答
  •  温柔的废话
    2020-11-21 06:30

    @user1413793 is correct about the problem. I'm not going to discuss that further, except to make one point: yes, for small values of n and large values of RAND_MAX, the modulo bias can be very small. But using a bias-inducing pattern means that you must consider the bias every time you calculate a random number and choose different patterns for different cases. And if you make the wrong choice, the bugs it introduces are subtle and almost impossible to unit test. Compared to just using the proper tool (such as arc4random_uniform), that's extra work, not less work. Doing more work and getting a worse solution is terrible engineering, especially when doing it right every time is easy on most platforms.

    Unfortunately, the implementations of the solution are all incorrect or less efficient than they should be. (Each solution has various comments explaining the problems, but none of the solutions have been fixed to address them.) This is likely to confuse the casual answer-seeker, so I'm providing a known-good implementation here.

    Again, the best solution is just to use arc4random_uniform on platforms that provide it, or a similar ranged solution for your platform (such as Random.nextInt on Java). It will do the right thing at no code cost to you. This is almost always the correct call to make.

    If you don't have arc4random_uniform, then you can use the power of opensource to see exactly how it is implemented on top of a wider-range RNG (ar4random in this case, but a similar approach could also work on top of other RNGs).

    Here is the OpenBSD implementation:

    /*
     * Calculate a uniformly distributed random number less than upper_bound
     * avoiding "modulo bias".
     *
     * Uniformity is achieved by generating new random numbers until the one
     * returned is outside the range [0, 2**32 % upper_bound).  This
     * guarantees the selected random number will be inside
     * [2**32 % upper_bound, 2**32) which maps back to [0, upper_bound)
     * after reduction modulo upper_bound.
     */
    u_int32_t
    arc4random_uniform(u_int32_t upper_bound)
    {
        u_int32_t r, min;
    
        if (upper_bound < 2)
            return 0;
    
        /* 2**32 % x == (2**32 - x) % x */
        min = -upper_bound % upper_bound;
    
        /*
         * This could theoretically loop forever but each retry has
         * p > 0.5 (worst case, usually far better) of selecting a
         * number inside the range we need, so it should rarely need
         * to re-roll.
         */
        for (;;) {
            r = arc4random();
            if (r >= min)
                break;
        }
    
        return r % upper_bound;
    }
    

    It is worth noting the latest commit comment on this code for those who need to implement similar things:

    Change arc4random_uniform() to calculate 2**32 % upper_bound as -upper_bound % upper_bound. Simplifies the code and makes it the same on both ILP32 and LP64 architectures, and also slightly faster on LP64 architectures by using a 32-bit remainder instead of a 64-bit remainder.

    Pointed out by Jorden Verwer on tech@ ok deraadt; no objections from djm or otto

    The Java implementation is also easily findable (see previous link):

    public int nextInt(int n) {
       if (n <= 0)
         throw new IllegalArgumentException("n must be positive");
    
       if ((n & -n) == n)  // i.e., n is a power of 2
         return (int)((n * (long)next(31)) >> 31);
    
       int bits, val;
       do {
           bits = next(31);
           val = bits % n;
       } while (bits - val + (n-1) < 0);
       return val;
     }
    

提交回复
热议问题