Why do people say there is modulo bias when using a random number generator?

前端 未结 10 1221
一整个雨季
一整个雨季 2020-11-21 05:48

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly the

10条回答
  •  北荒
    北荒 (楼主)
    2020-11-21 06:36

    As the accepted answer indicates, "modulo bias" has its roots in the low value of RAND_MAX. He uses an extremely small value of RAND_MAX (10) to show that if RAND_MAX were 10, then you tried to generate a number between 0 and 2 using %, the following outcomes would result:

    rand() % 3   // if RAND_MAX were only 10, gives
    output of rand()   |   rand()%3
    0                  |   0
    1                  |   1
    2                  |   2
    3                  |   0
    4                  |   1
    5                  |   2
    6                  |   0
    7                  |   1
    8                  |   2
    9                  |   0
    

    So there are 4 outputs of 0's (4/10 chance) and only 3 outputs of 1 and 2 (3/10 chances each).

    So it's biased. The lower numbers have a better chance of coming out.

    But that only shows up so obviously when RAND_MAX is small. Or more specifically, when the number your are modding by is large compared to RAND_MAX.

    A much better solution than looping (which is insanely inefficient and shouldn't even be suggested) is to use a PRNG with a much larger output range. The Mersenne Twister algorithm has a maximum output of 4,294,967,295. As such doing MersenneTwister::genrand_int32() % 10 for all intents and purposes, will be equally distributed and the modulo bias effect will all but disappear.

提交回复
热议问题