I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly the
With a RAND_MAX
value of 3
(in reality it should be much higher than that but the bias would still exist) it makes sense from these calculations that there is a bias:
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
random_between(1, 3) % 2 = more likely a 1
In this case, the % 2
is what you shouldn't do when you want a random number between 0
and 1
. You could get a random number between 0
and 2
by doing % 3
though, because in this case: RAND_MAX
is a multiple of 3
.
Another method
There is much simpler but to add to other answers, here is my solution to get a random number between 0
and n - 1
, so n
different possibilities, without bias.
>= n
, restart (no modulo).Really random data is not easy to obtain, so why use more bits than needed.
Below is an example in Smalltalk, using a cache of bits from a pseudo-random number generator. I'm no security expert so use at your own risk.
next: n
| bitSize r from to |
n < 0 ifTrue: [^0 - (self next: 0 - n)].
n = 0 ifTrue: [^nil].
n = 1 ifTrue: [^0].
cache isNil ifTrue: [cache := OrderedCollection new].
cache size < (self randmax highBit) ifTrue: [
Security.DSSRandom default next asByteArray do: [ :byte |
(1 to: 8) do: [ :i | cache add: (byte bitAt: i)]
]
].
r := 0.
bitSize := n highBit.
to := cache size.
from := to - bitSize + 1.
(from to: to) do: [ :i |
r := r bitAt: i - from + 1 put: (cache at: i)
].
cache removeFrom: from to: to.
r >= n ifTrue: [^self next: n].
^r