Why do people say there is modulo bias when using a random number generator?

前端 未结 10 1220
一整个雨季
一整个雨季 2020-11-21 05:48

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly the

10条回答
  •  南方客
    南方客 (楼主)
    2020-11-21 06:53

    There are two usual complaints with the use of modulo.

    • one is valid for all generators. It is easier to see in a limit case. If your generator has a RAND_MAX which is 2 (that isn't compliant with the C standard) and you want only 0 or 1 as value, using modulo will generate 0 twice as often (when the generator generates 0 and 2) as it will generate 1 (when the generator generates 1). Note that this is true as soon as you don't drop values, whatever the mapping you are using from the generator values to the wanted one, one will occurs twice as often as the other.

    • some kind of generator have their less significant bits less random than the other, at least for some of their parameters, but sadly those parameter have other interesting characteristic (such has being able to have RAND_MAX one less than a power of 2). The problem is well known and for a long time library implementation probably avoid the problem (for instance the sample rand() implementation in the C standard use this kind of generator, but drop the 16 less significant bits), but some like to complain about that and you may have bad luck

    Using something like

    int alea(int n){ 
     assert (0 < n && n <= RAND_MAX); 
     int partSize = 
          n == RAND_MAX ? 1 : 1 + (RAND_MAX-n)/(n+1); 
     int maxUsefull = partSize * n + (partSize-1); 
     int draw; 
     do { 
       draw = rand(); 
     } while (draw > maxUsefull); 
     return draw/partSize; 
    }
    

    to generate a random number between 0 and n will avoid both problems (and it avoids overflow with RAND_MAX == INT_MAX)

    BTW, C++11 introduced standard ways to the the reduction and other generator than rand().

提交回复
热议问题