Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
The premise behind Adam Rosenfield's correct answer is:
When n equals 2, you have 4 throw-away possibilities: y = {22, 23, 24, 25}. If you use n equals 6, you only have 1 throw-away: y = {15625}.
5^6 = 15625
7 * 2232 = 15624
You call rand5 more times. However, you have a much lower chance of getting a throw-away value (or an infinite loop). If there is a way to get no possible throw-away value for y, I haven't found it yet.
I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.
Perhaps Adam Rosenfield would be kind enough to comment?
My (naive?) idea is this:
Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.
To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:
(5a + rand5) % 7 = (k*7 + (5a%7) + rand5) % 7 = ( (5a%7) + rand5) % 7
The rand7() function follows:
(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)
int rand7(){
static int a=0;
static int e=0;
int r;
a = a * 5 + rand5();
e = e + 5; // added 5/7ths of a rand7 number
if ( e<7 ){
a = a * 5 + rand5();
e = e + 5; // another 5/7ths
}
r = a % 7;
e = e - 7; // removed a rand7 number
a = a % 7;
return r;
}
Edit: Added results for 100 million trials.
'Real' rand functions mod 5 or 7
rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046
My rand7
Average looks ok and number distributions look ok too.
randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943
There are elegant algorithms cited above, but here's one way to approach it, although it might be roundabout. I am assuming values generated from 0.
R2 = random number generator giving values less than 2 (sample space = {0, 1})
R8 = random number generator giving values less than 8 (sample space = {0, 1, 2, 3, 4, 5, 6, 7})
In order to generate R8 from R2, you will run R2 thrice, and use the combined result of all 3 runs as a binary number with 3 digits. Here are the range of values when R2 is ran thrice:
0 0 0 --> 0
.
.
1 1 1 --> 7
Now to generate R7 from R8, we simply run R7 again if it returns 7:
int R7() {
do {
x = R8();
} while (x > 6)
return x;
}
The roundabout solution is to generate R2 from R5 (just like we generated R7 from R8), then R8 from R2 and then R7 from R8.
I'd like to add another answer, in addition to my first answer. This answer attempts to minimize the number of calls to rand5()
per call to rand7()
, to maximize the usage of randomness. That is, if you consider randomness to be a precious resource, we want to use as much of it as possible, without throwing away any random bits. This answer also has some similarities with the logic presented in Ivan's answer.
The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log2 N. Thus, rand5()
has approximately 2.32193 bits of entropy, and rand7()
has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5()
, and apply them to generating 2.80735 bits of entropy needed for each call to rand7()
. The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5()
per call to rand7()
.
Side notes: all logarithms in this answer will be base 2 unless specified otherwise. rand5()
will be assumed to return numbers in the range [0, 4], and rand7()
will be assumed to return numbers in the range [0, 6]. Adjusting the ranges to [1, 5] and [1, 7] respectively is trivial.
So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a
1a
2a
3..., where each digit ai
is chosen by a call to rand5()
. For example, if our RNG chose ai
= 1 for all i
, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/52 + 1/53 + ... = 1/4 (sum of a geometric series).
Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a
, b
] equals the length of that interval, b - a
. The proof is left as an exercise for the reader =).
Now that we have a random real number selected uniformly from the range [0, 1], we need to convert it to a series of uniformly random numbers in the range [0, 6] to generate the output of rand7()
. How do we do this? Just the reverse of what we just did -- we convert it to an infinitely precise decimal in base 7, and then each base 7 digit will correspond to one output of rand7()
.
Taking the example from earlier, if our rand5()
produces an infinite stream of 1's, then our random real number will be 1/4. Converting 1/4 to base 7, we get the infinite decimal 0.15151515..., so we will produce as output 1, 5, 1, 5, 1, 5, etc.
Ok, so we have the main idea, but we have two problems left: we can't actually compute or store an infinitely precise real number, so how do we deal with only a finite portion of it? Secondly, how do we actually convert it to base 7?
One way we can convert a number between 0 and 1 to base 7 is as follows:
To deal with the problem of infinite precision, we compute a partial result, and we also store an upper bound on what the result could be. That is, suppose we've called rand5()
twice and it returned 1 both times. The number we've generated so far is 0.11 (base 5). Whatever the rest of the infinite series of calls to rand5()
produce, the random real number we're generating will never be larger than 0.12: it is always true that 0.11 ≤ 0.11xyz... < 0.12.
So, keeping track of the current number so far, and the maximum value it could ever take, we convert both numbers to base 7. If they agree on the first k
digits, then we can safely output the next k
digits -- regardless of what the infinite stream of base 5 digits are, they will never affect the next k
digits of the base 7 representation!
And that's the algorithm -- to generate the next output of rand7()
, we generate only as many digits of rand5()
as we need to ensure that we know with certainty the value of the next digit in the conversion of the random real number to base 7. Here is a Python implementation, with a test harness:
import random
rand5_calls = 0
def rand5():
global rand5_calls
rand5_calls += 1
return random.randint(0, 4)
def rand7_gen():
state = 0
pow5 = 1
pow7 = 7
while True:
if state / pow5 == (state + pow7) / pow5:
result = state / pow5
state = (state - result * pow5) * 7
pow7 *= 7
yield result
else:
state = 5 * state + pow7 * rand5()
pow5 *= 5
if __name__ == '__main__':
r7 = rand7_gen()
N = 10000
x = list(next(r7) for i in range(N))
distr = [x.count(i) for i in range(7)]
expmean = N / 7.0
expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))
print '%d TRIALS' % N
print 'Expected mean: %.1f' % expmean
print 'Expected standard deviation: %.1f' % expstddev
print
print 'DISTRIBUTION:'
for i in range(7):
print '%d: %d (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
print
print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)
Note that rand7_gen()
returns a generator, since it has internal state involving the conversion of the number to base 7. The test harness calls next(r7)
10000 times to produce 10000 random numbers, and then it measures their distribution. Only integer math is used, so the results are exactly correct.
Also note that the numbers here get very big, very fast. Powers of 5 and 7 grow quickly. Hence, performance will start to degrade noticeably after generating lots of random numbers, due to bignum arithmetic. But remember here, my goal was to maximize the usage of random bits, not to maximize performance (although that is a secondary goal).
In one run of this, I made 12091 calls to rand5()
for 10000 calls to rand7()
, achieving the minimum of log(7)/log(5) calls on average to 4 significant figures, and the resulting output was uniform.
In order to port this code to a language that doesn't have arbitrarily large integers built-in, you'll have to cap the values of pow5
and pow7
to the maximum value of your native integral type -- if they get too big, then reset everything and start over. This will increase the average number of calls to rand5()
per call to rand7()
very slightly, but hopefully it shouldn't increase too much even for 32- or 64-bit integers.
As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:
$num = 0;
$possibilities = 1;
sub rand7
{
while( $possibilities < 7 )
{
$num = $num * 5 + int(rand(5));
$possibilities *= 5;
}
my $result = $num % 7;
$num = int( $num / 7 );
$possibilities /= 7;
return $result;
}
Algorithm:
7 can be represented in a sequence of 3 bits
Use rand(5) to randomly fill each bit with 0 or 1.
For e.g: call rand(5) and
if the result is 1 or 2, fill the bit with 0
if the result is 4 or 5, fill the bit with 1
if the result is 3 , then ignore and do it again (rejection)
This way we can fill 3 bits randomly with 0/1 and thus get a number from 1-7.
EDIT: This seems like the simplest and most efficient answer, so here's some code for it:
public static int random_7() {
int returnValue = 0;
while (returnValue == 0) {
for (int i = 1; i <= 3; i++) {
returnValue = (returnValue << 1) + random_5_output_2();
}
}
return returnValue;
}
private static int random_5_output_2() {
while (true) {
int flip = random_5();
if (flip < 3) {
return 0;
}
else if (flip > 3) {
return 1;
}
}
}