Can `rand()` in c++ be used to generate unbiased bools?

不问归期 提交于 2019-12-05 10:59:36

One common defect in random number generators is a slight bias towards smaller results (basically a slight bias towards 0 in high order bits). This often happens when wrapping the RNG internal state to the output range is done using a simple mod, which is biased against high values unless RAND_MAX is a divisor of the size of the internal state. Here's a typical biased mapping implementation:

static unsigned int state;

int rand() {
   state = nextState(); /* this actually moves the state from one random value to the next, eg., using a LCG */
   return state % RAND_MAX;  /* biased */
}

The bias occurs because lower values output an have one more mapping under mod from the state. E.g., if the state can have values 0-9 (10 values), and RAND_MAX is 3 (so values 0-2), then the % 3 operation results in, depending on the state

Output  State
0       0 3 6 9 
1       1 4 7
2       2 5 8

The result 0 is over-represented because it has a 4/10 chance of being selected, vs 3/10 for the other values.

As an example with more likely values, if the internal RNG state is a 16-integer, and RAND_MAX is 35767 (as you mentioned it is on your platform), then all the values [0,6000] will be be output for 3 different state values, but the remaining ~30,000 values will only be output for 2 distinct state values - a significant bias. This kind of bias would tend to cause your counter value to be higher than expected (since smaller than uniform returns from rand() favors the p_scaled >= 1 condition.

It would help if you could post the exact implementation of rand() on your platform. If it turns out to be bias in the high bits, you may be able to eliminate this by passing the values you get from rand() through a good hash function, but a better approach is probably just to use a high quality source of random numbers, e.g., the Mersenne Twister . A better generator will also have a larger output range (effective, a higher RAND_MAX), which means your algorithm will suffer fewer retries/less recursion.

Even if the Visual Studio runtime implementation suffers from this defect, it is worth noting that it was probably at least partly an intentional design choice - using a RAND_MAX like 35767 that is relatively prime to the state size (typically a power of 2), ensures better randomness of the lower bits, since the % operation effectively mixes the high and low order bits - and having biased/non-random low order bits is often a bigger problem in practice than a slight bias in the high order bits because of the ubiquity of the caller of rand() reducing the range using %, which effectively uses only the low order bits for moduli which are powers of 2 (also very common).

I tried your code in Linux and the results were actually pretty decent. However, it seems like you are in Windows where RAND_MAX is around 32768 probably. I'm saying since gcc complained in Linux that RAND_MAX+1 results in integer overflow so I had to add a cast.

So the problem is most likely that either RAND_MAX is too small or the implementation of rand() on your system is not very good.

If the source of the problem is the implementation of rand(), your only option would be to change to another function from a better library. However, if the problem is the first one, you might be able to solve it as follows.

/* change `rand()` to return two concatenated rands */
typedef long long rand_type; /* this type depends on your actual system, you might get away with `int` */
#define BIGGER_RAND_MAX ((RAND_MAX + 2) * RAND_MAX)
rand_type bigger_rand(void)
{
    return (rand_type)rand() * (RAND_MAX + 1) + rand();
}

And then try your program with this rand that has a higher range. If the problem persists, most likely it's your rand() function that is far from random.


Side note: your random_bool should return bool, not double! Since you are checking a double against zero, that could also be the source of the problem, where you have false positives because the double could be not exactly zero.

i think this function's result is relate to the RAND_MAX value, in this case p = 1e-6 , if RAND_MAX equal 9999 then this will never return true

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!