How does bias manifest in bounded random number generation

问题

I am trying to digest the following post https://www.pcg-random.org/posts/bounded-rands.html on non biased, efficient random number generation.

Here is an excerpt describing the classical, modulo approach.

uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    return rng() % range;
}

But in addition to being slow, it is also biased. To understand why rand() % 52 produces biased numbers, if we assume that rand() produces numbers in the range [0..2^32), observe that 52 does not perfectly divide 2^32, it divides it 82,595,524 times with remainder 48. Meaning that if we use rand() % 52, there will be 82,595,525 ways to select the first 48 cards from our 52-card deck and only 82,595,524 ways to select the final four cards . In other words, there is a 0.00000121% bias against these last four cards...

The post goes on to show another technique that uses floating-point arithmetic to essentially generate a random fraction of the desired range and truncate it to an integer.

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    double zeroone = 0x1.0p-32 * rng();
    return range * zeroone;
}

This approach is just as biased as the classic modulo approach, but the bias manifests itself differently. For example, if we were choosing numbers in the range [0..52), the numbers 0, 13, 26 and 39 would appear once less often than the others.

The last paragraph is what has me confused. I am not well versed in floating-point arithmetic, so I am struggling to make the connection between the bias in the modulo method and the bias in the floating-point method. All I see is that in both techniques, 4 numbers are biased against.

回答1:

Let's start small. Say we have a method rng() that generates any random integer in [0, 128). If we map all of its 128 outcomes as follows (where X is one of these outcomes):

 floor((X / 128.0) * 52)

Then we get the following table:

 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 13, 13, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 19, 19, 19, 20, 20, 21, 21, 21, 22, 22, 23, 23, 23, 24, 24, 25, 25, 26, 26, 26, 27, 27, 28, 28, 28, 29, 29, 30, 30, 30, 31, 31, 32, 32, 32, 33, 33, 34, 34, 34, 35, 35, 36, 36, 36, 37, 37, 38, 38, 39, 39, 39, 40, 40, 41, 41, 41, 42, 42, 43, 43, 43, 44, 44, 45, 45, 45, 46, 46, 47, 47, 47, 48, 48, 49, 49, 49, 50, 50, 51, 51

Note that some numbers occur twice in this table, others three times. This is because we're mapping a large range to a small one and 128 is not divisible by 52, and also because of rounding error. In this example, 52 divided by 128 is about 0.4, so the next entry in the table is the previous entry plus about 0.4, then all the entries in the table are rounded down, creating some numbers that occur more frequently than others. On the other hand, if we used 64 instead of 52, then all 64 entries in the 128-item table would occur exactly twice.

See also "A Fast Alternative to the Modulo Reduction" by Daniel Lemire.

Here is how the table above was formed in detail. If we mapped these outcomes as follows instead:

X / 128.0

Then the start of the table will look like:

0.000, 0.008, 0.016, 0.023, 0.031, 0.039, 0.047, 0.055, 0.062, 0.070, 0.078, 0.086, 0.094, 0.102, 0.109, 0.117, 0.125, 0.133, ...

If we multiply this table by 52, it will now look like:

0.000, 0.406, 0.812, 1.219, 1.625, 2.031, 2.438, 2.844, 3.250, 3.656, 4.062, 4.469, 4.875, 5.281, 5.688, 6.094, 6.500, 6.906, 7.312, ...

And finally we round down to get:

0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, ...

来源：https://stackoverflow.com/questions/61107920/how-does-bias-manifest-in-bounded-random-number-generation

标签

random