Why do people say there is modulo bias when using a random number generator?

前端 未结 10 1240
一整个雨季
一整个雨季 2020-11-21 05:48

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly the

10条回答
  •  青春惊慌失措
    2020-11-21 06:51

    With a RAND_MAX value of 3 (in reality it should be much higher than that but the bias would still exist) it makes sense from these calculations that there is a bias:

    1 % 2 = 1 2 % 2 = 0 3 % 2 = 1 random_between(1, 3) % 2 = more likely a 1

    In this case, the % 2 is what you shouldn't do when you want a random number between 0 and 1. You could get a random number between 0 and 2 by doing % 3 though, because in this case: RAND_MAX is a multiple of 3.

    Another method

    There is much simpler but to add to other answers, here is my solution to get a random number between 0 and n - 1, so n different possibilities, without bias.

    • the number of bits (not bytes) needed to encode the number of possibilities is the number of bits of random data you'll need
    • encode the number from random bits
    • if this number is >= n, restart (no modulo).

    Really random data is not easy to obtain, so why use more bits than needed.

    Below is an example in Smalltalk, using a cache of bits from a pseudo-random number generator. I'm no security expert so use at your own risk.

    next: n
    
        | bitSize r from to |
        n < 0 ifTrue: [^0 - (self next: 0 - n)].
        n = 0 ifTrue: [^nil].
        n = 1 ifTrue: [^0].
        cache isNil ifTrue: [cache := OrderedCollection new].
        cache size < (self randmax highBit) ifTrue: [
            Security.DSSRandom default next asByteArray do: [ :byte |
                (1 to: 8) do: [ :i |    cache add: (byte bitAt: i)]
            ]
        ].
        r := 0.
        bitSize := n highBit.
        to := cache size.
        from := to - bitSize + 1.
        (from to: to) do: [ :i |
            r := r bitAt: i - from + 1 put: (cache at: i)
        ].
        cache removeFrom: from to: to.
        r >= n ifTrue: [^self next: n].
        ^r
    

提交回复
热议问题