问题
I am rewriting a monte carlo simulation in C from Objective C to use in a dll from VBA/Excel. The "engine" in the calculation is the creation of a random number between 0 and 10001 that is compared to a variable in the 5000-7000 neighbourhood. This is used 4-800 times per iteration and I use 100000 iterations. So that is about 50.000.000 generations of random numbers per run.
While in Objective C the tests showed no bias, I have huge problems with the C code. Objective C is a superset of C, so 95% of the code was copy paste and hard to screw up. I have gone through the rest many times all day yesterday and today and I have found no problems.
I am left with the difference between arc4random_uniform() and rand() with the use of srand(), especially because a bias towards the lower numbers of 0 to 10000. The test I have conducted is consistent with such a bias of .5 to 2 % towards numbers below circa 5000. The any other explanation is if my code avoided repeats which I guess it doesn´t do.
the code is really simple ("spiller1evne" and "spiller2evne" being a number between 5500 and 6500):
srand((unsigned)time(NULL));
for (j=0;j<antala;++j){
[..]
for (i=1;i<450;i++){
chance = (rand() % 10001);
[..]
if (grey==1) {
if (chance < spiller1evnea) vinder = 1;
else vinder = 2;
}
else{
if (chance < spiller2evnea) vinder = 2;
else vinder = 1;
}
Now I don´t need true randomness, pseudorandomness is quite fine. I only need it to be approximatly even distributed on a cummulative basis (like it doesn´t matter much if 5555 is twice as likely to come out as 5556. It does matter if 5500-5599 is 5% more likely as 5600-5699 and if there is a clear 0.5-2% bias towards 0-4000 than 6000-9999.
First, does it sound plausible that rand() is my problem and Is there an easy implementation that meets my low needs?
EDIT: if my suspicion is plausible, could I use any on this:
http://www.azillionmonkeys.com/qed/random.html
Would I be able to just copy paste this in as a replacement (I am writing in C and using Visual Studio, really novice)?:
#include <stdlib.h>
#define RS_SCALE (1.0 / (1.0 + RAND_MAX))
double drand (void) {
double d;
do {
d = (((rand () * RS_SCALE) + rand ()) * RS_SCALE + rand ()) * RS_SCALE;
} while (d >= 1); /* Round off */
return d;
}
#define irand(x) ((unsigned int) ((x) * drand ()))
EDIT2: Well clearly the above code works without the same bias so I would this be a recommendation for anyone who have the same "middle-of-the-road"-need as I described above. It does come with a penalty as it calls rand() three times. So I am still looking for a faster solution.
回答1:
The rand()
function generates an int
in the range [0, RAND_MAX
]. If you convert this to a different range via the modulus operator (%
), as your original code does, then that introduces non-uniformity unless the size of your target range happens to evenly divide RAND_MAX + 1
. That sounds like exactly what you see.
You have multiple options, but if you want to stick with something based on rand()
then I suggest this variation on your original approach:
/*
* Returns a pseudo-random int selected from the uniform distribution
* over the half-open interval [0, limit), provided that limit does not
* exceed RAND_MAX.
*/
int range_rand(int limit) {
int rand_bound = (RAND_MAX / limit) * limit;
int r;
while ((r = rand()) >= rand_bound) { /* empty */ }
return r % limit;
}
Although in principle the number of rand()
calls each call to that function will generate is unbounded, in practice the average number of calls is only slightly greater than 1 for relatively small limit
values, and the average is less than 2 for every limit
value. It removes the non-uniformity I described earlier by choosing the initial random number from a subset of [0, RAND_MAX
] whose size is evenly divided by the limit
.
来源:https://stackoverflow.com/questions/29369768/avoid-basic-rand-bias-in-monte-carlo-simulation