Why do people say there is modulo bias when using a random number generator?

前端 未结 10 1224
一整个雨季
一整个雨季 2020-11-21 05:48

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly the

10条回答
  •  醉话见心
    2020-11-21 06:34

    Modulo reduction is a commonly seen way to make a random integer generator avoid the worst case of running forever.

    However, there is no way to "fix" this worst case without introducing bias. It's not just modulo reduction (rand() % n, discussed in the accepted answer) that will introduce bias this way, but also the "multiply-and-shift" reduction of Daniel Lemire, or if you stop rejecting an outcome after a set number of iterations.

    Here is the reason why, and here we will assume we have a "true" random generator that can produce unbiased and independent random bits.*

    In 1976, D. E. Knuth and A. C. Yao showed that any algorithm that produces random integers with a given probability, using only random bits, can be represented as a binary tree, where random bits indicate which way to traverse the tree and each leaf (endpoint) corresponds to an outcome. In this case, we're dealing with algorithms that generate random integers in [0, n), where each integer is chosen with probability 1/n. But if 1/n has a non-terminating binary expansion (which will be the case if n is not a power of 2), this binary tree will necessarily either—

    • have an "infinite" depth, or
    • include "rejection" leaves at the end of the tree,

    and in either case, the algorithm won't run in constant time and will run forever in the worst case. (On the other hand, when n is a power of 2, the optimal binary tree will have a finite depth and no rejection nodes.)

    The binary tree concept also shows that any way to "fix" this worst-case time complexity will lead to bias in general. For instance, modulo reductions are equivalent to a binary tree in which rejection leaves are replaced with labeled outcomes — but since there are more possible outcomes than rejection leaves, only some of the outcomes can take the place of the rejection leaves, introducing bias. The same kind of binary tree — and the same kind of bias — results if you stop rejecting after a set number of iterations. (However, this bias may be negligible depending on the application. There are also security aspects to random integer generation, which are too complicated to discuss in this answer.)

    To illustrate, the following JavaScript code implements a random integer algorithm called the Fast Dice Roller by J. Lumbroso (2013). Note that it includes a rejection event and a loop which are necessary to make the algorithm unbiased in the general case.

    function randomInt(minInclusive, maxExclusive) {
      var maxInclusive = (maxExclusive - minInclusive) - 1
      var x = 1
      var y = 0
      while(true) {
        x = x * 2
        var randomBit = (Math.random() < 0.5 ? 0 : 1)
        y = y * 2 + randomBit
        if(x > maxInclusive) {
          if (y <= maxInclusive) { return y + minInclusive }
          // Rejection
          x = x - maxInclusive - 1
          y = y - maxInclusive - 1
        }
      }
    }
    

    Note

    * This answer won't involve the rand() function in C because it has many issues. Perhaps the most serious here is the fact that the C standard does not specify a particular distribution for the numbers returned by rand().

提交回复
热议问题