Generating a uniform distribution of INTEGERS in C

后端 未结 4 1045
情书的邮戳
情书的邮戳 2020-12-01 09:48

I\'ve written a C function that I think selects integers from a uniform distribution with range [rangeLow, rangeHigh], inclusive. This isn\

相关标签:
4条回答
  • 2020-12-01 10:22

    A version which corrects the distribution errors (noted by Lior), involves the high-bits returned by rand() and only uses integer math (if that's desirable):

    int uniform_distribution(int rangeLow, int rangeHigh)
    {
        int range = rangeHigh - rangeLow + 1; //+1 makes it [rangeLow, rangeHigh], inclusive.
        int copies=RAND_MAX/range; // we can fit n-copies of [0...range-1] into RAND_MAX
        // Use rejection sampling to avoid distribution errors
        int limit=range*copies;    
        int myRand=-1;
        while( myRand<0 || myRand>=limit){
            myRand=rand();   
        }
        return myRand/copies+rangeLow;    // note that this involves the high-bits
    }
    

    //note: make sure rand() was already initialized using srand()

    This should work well provided that range is much smaller than RAND_MAX, otherwise you'll be back to the problem that rand() isn't a good random number generator in terms of its low-bits.

    0 讨论(0)
  • 2020-12-01 10:38

    On some implementations, rand() did not provide good randomness on its lower order bits, so the modulus operator would not provide very random results. If you find that to be the case, you could try this instead:

    int uniform_distribution(int rangeLow, int rangeHigh) {
        double myRand = rand()/(1.0 + RAND_MAX); 
        int range = rangeHigh - rangeLow + 1;
        int myRand_scaled = (myRand * range) + rangeLow;
        return myRand_scaled;
    }
    

    Using rand() this way will produce a bias as noted by Lior. But, the technique is fine if you can find a uniform number generator to calculate myRand. One possible candidate would be drand48(). This will greatly reduce the amount of bias to something that would be very difficult to detect.

    However, if you need something cryptographically secure, you should use an algorithm outlined in Lior's answer, assuming your rand() is itself cryptographically secure (the default one is probably not, so you would need to find one). Below is a simplified implementation of what Lior described. Instead of counting bits, we assume the range falls within RAND_MAX, and compute a suitable multiple. Worst case, the algorithm ends up calling the random number generator twice on average per request for a number in the range.

    int uniform_distribution_secure(int rangeLow, int rangeHigh) {
        int range = rangeHigh - rangeLow + 1;
        int secureMax = RAND_MAX - RAND_MAX % range;
        int x;
        do x = secure_rand(); while (x >= secureMax);
        return rangeLow + x % range;
    }
    
    0 讨论(0)
  • 2020-12-01 10:39

    I think it is known that rand() is not very good. It just depends on how good of "random" data you need.

    • http://www.azillionmonkeys.com/qed/random.html
    • http://www.linuxquestions.org/questions/programming-9/generating-random-numbers-in-c-378358/
    • http://forums.indiegamer.com/showthread.php?9460-Using-C-rand%28%29-isn-t-as-bad-as-previously-thought

    I suppose you could write a test then calculate the chi-squared value to see how good your uniform generator is:

    http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

    Depending on your use (don't use this for your online poker shuffler), you might consider a LFSR

    http://en.wikipedia.org/wiki/Linear_feedback_shift_register

    It may be faster, if you just want some psuedo-random output. Also, supposedly they can be uniform, although I haven't studied the math enough to back up that claim.

    0 讨论(0)
  • 2020-12-01 10:40

    Let's assume that rand() generates a uniformly-distributed value I in the range [0..RAND_MAX], and you want to generate a uniformly-distributed value O in the range [L,H].

    Suppose I in is the range [0..32767] and O is in the range [0..2].

    According to your suggested method, O= I%3. Note that in the given range, there are 10923 numbers for which I%3=0, 10923 number for which I%3=1, but only 10922 number for which I%3=2. Hence your method will not map a value from I into O uniformly.

    As another example, suppose O is in the range [0..32766].

    According to your suggested method, O=I%32767. Now you'll get O=0 for both I=0 and I=32767. Hence 0 is twice as likely than any other value - your method is again nonuniform.


    The suggest way to generate a uniform mapping is as follow:

    1. Calculate the number of bits that are needed to store a random value in the range [L,H]:

      unsigned int nRange = (unsigned int)H - (unsigned int)L + 1;
      unsigned int nRangeBits= (unsigned int)ceil(log((double(nRange) / log(2.));

    2. Generate nRangeBits random bits

      this can be easily implemented by shifting-right the result of rand()

    3. Ensure that the generated number is not greater than H-L. If it is - repeat step 2.

    4. Now you can map the generated number into O just by adding a L.

    0 讨论(0)
提交回复
热议问题