Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream, I
, of uniformly distributed integers of length m
from 1-5 outputs a stream O
, of uniformly distributed integers from 1-7 of the longest length relative to m
, say L(m)
.
The simplest way to analyse this is to treat the streams I and O
as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the stream a1, a2, a3,... -> a1+5*a2+5^2*a3+..
and similarly for stream O
.
Then if we take a section of the input stream of length m choose n s.t. 5^m-7^n=c
where c>0
and is as small as possible. Then there is a uniform map from the input stream of length m to integers from 1
to 5^m
and another uniform map from integers from 1 to 7^n
to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds 7^n
.
So this gives a value for L(m)
of around m (log5/log7)
which is approximately .82m
.
The difficulty with the above analysis is the equation 5^m-7^n=c
which is not easy to solve exactly and the case where the uniform value from 1
to 5^m
exceeds 7^n
and we lose efficiency.
The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?
If 5^m-7^n=c
then from the input stream we effectively generate a uniform random number from 0
to (5^m)-1
and don't use any values higher than 7^n
. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to 5^m-7^n
. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.
If we let T7(X)
to be the average length of the output sequence of random(1-7)
integers derived from a uniform input of size X
, and assuming that 5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7
.
Then T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0)
since we have a length no sequence with probability 7^n0/5^m with a residual of length 5^m-7^n0
with probability (5^m-7^n0)/5^m)
.
If we just keep substituting we obtain:
T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m
Hence
L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)
Another way of putting this is:
If 5^m has 7-ary representation `a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r
Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)
The best possible case is my original one above where 5^m=7^n+s
, where s<7
.
Then T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1)
as before.
The worst case is when we can only find k and s.t 5^m = kx7+s.
Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1)
Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:
T7(5^m) = m (Log5/Log7)+e(m)
It seems impossible to achieve e(m) = o(1)
in general but hopefully we can prove e(m)=o(m)
.
The whole thing then rests on the distribution of the 7-ary digits of 5^m
for various values of m
.
I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.
int rand7() {
int value = rand5()
+ rand5() * 2
+ rand5() * 3
+ rand5() * 4
+ rand5() * 5
+ rand5() * 6;
return value%7;
}
Unlike the chosen solution, the algorithm will run in constant time. It does however make 2 more calls to rand5 than the average run time of the chosen solution.
Note that this generator is not perfect (the number 0 has 0.0064% more chance than any other number), but for most practical purposes the guarantee of constant time probably outweighs this inaccuracy.
Explanation
This solution is derived from the fact that the number 15,624 is divisible by 7 and thus if we can randomly and uniformly generate numbers from 0 to 15,624 and then take mod 7 we can get a near-uniform rand7 generator. Numbers from 0 to 15,624 can be uniformly generated by rolling rand5 6 times and using them to form the digits of a base 5 number as follows:
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
Properties of mod 7 however allow us to simplify the equation a bit:
5^5 = 3 mod 7
5^4 = 2 mod 7
5^3 = 6 mod 7
5^2 = 4 mod 7
5^1 = 5 mod 7
So
rand5 * 5^5 + rand5 * 5^4 + rand5 * 5^3 + rand5 * 5^2 + rand5 * 5 + rand5
becomes
rand5 * 3 + rand5 * 2 + rand5 * 6 + rand5 * 4 + rand5 * 5 + rand5
Theory
The number 15,624 was not chosen randomly, but can be discovered using fermat's little theorem, which states that if p is a prime number then
a^(p-1) = 1 mod p
So this gives us,
(5^6)-1 = 0 mod 7
(5^6)-1 is equal to
4 * 5^5 + 4 * 5^4 + 4 * 5^3 + 4 * 5^2 + 4 * 5 + 4
This is a number in base 5 form and thus we can see that this method can be used to go from any random number generator to any other random number generator. Though a small bias towards 0 is always introduced when using the exponent p-1.
To generalize this approach and to be more accurate we can have a function like this:
def getRandomconverted(frm, to):
s = 0
for i in range(to):
s += getRandomUniform(frm)*frm**i
mx = 0
for i in range(to):
mx = (to-1)*frm**i
mx = int(mx/to)*to # maximum value till which we can take mod
if s < mx:
return s%to
else:
return getRandomconverted(frm, to)
There is no (exactly correct) solution which will run in a constant amount of time, since 1/7 is an infinite decimal in base 5. One simple solution would be to use rejection sampling, e.g.:
int i;
do
{
i = 5 * (rand5() - 1) + rand5(); // i is now uniformly random between 1 and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1; // result is now uniformly random between 1 and 7
This has an expected runtime of 25/21 = 1.19 iterations of the loop, but there is an infinitesimally small probability of looping forever.
int ans = 0;
while (ans == 0)
{
for (int i=0; i<3; i++)
{
while ((r = rand5()) == 3){};
ans += (r < 3) >> i
}
}
just scale your output from your first function
0) you have a number in range 1-5
1) subtract 1 to make it in range 0-4
2) multiply by (7-1)/(5-1) to make it in range 0-6
3) add 1 to increment the range: Now your result is in between 1-7
in php
function rand1to7() {
do {
$output_value = 0;
for ($i = 0; $i < 28; $i++) {
$output_value += rand1to5();
}
while ($output_value != 140);
$output_value -= 12;
return floor($output_value / 16);
}
loops to produce a random number between 16 and 127, divides by sixteen to create a float between 1 and 7.9375, then rounds down to get an int between 1 and 7. if I am not mistaken, there is a 16/112 chance of getting any one of the 7 outcomes.