I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly the
As the accepted answer indicates, "modulo bias" has its roots in the low value of RAND_MAX
. He uses an extremely small value of RAND_MAX
(10) to show that if RAND_MAX were 10, then you tried to generate a number between 0 and 2 using %, the following outcomes would result:
rand() % 3 // if RAND_MAX were only 10, gives
output of rand() | rand()%3
0 | 0
1 | 1
2 | 2
3 | 0
4 | 1
5 | 2
6 | 0
7 | 1
8 | 2
9 | 0
So there are 4 outputs of 0's (4/10 chance) and only 3 outputs of 1 and 2 (3/10 chances each).
So it's biased. The lower numbers have a better chance of coming out.
But that only shows up so obviously when RAND_MAX
is small. Or more specifically, when the number your are modding by is large compared to RAND_MAX
.
A much better solution than looping (which is insanely inefficient and shouldn't even be suggested) is to use a PRNG with a much larger output range. The Mersenne Twister algorithm has a maximum output of 4,294,967,295. As such doing MersenneTwister::genrand_int32() % 10
for all intents and purposes, will be equally distributed and the modulo bias effect will all but disappear.