Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
I think I have four answers, two giving exact solutions like that of @Adam Rosenfield but without the infinite loop problem, and other two with almost perfect solution but faster implementation than first one.
The best exact solution requires 7 calls to rand5
, but lets proceed in order to understand.
Strength of Adam's answer is that it gives a perfect uniform distribution, and there is very high probability (21/25) that only two calls to rand5() will be needed. However, worst case is infinite loop.
The first solution below also gives a perfect uniform distribution, but requires a total of 42 calls to rand5
. No infinite loops.
Here is an R implementation:
rand5 <- function() sample(1:5,1)
rand7 <- function() (sum(sapply(0:6, function(i) i + rand5() + rand5()*2 + rand5()*3 + rand5()*4 + rand5()*5 + rand5()*6)) %% 7) + 1
For people not familiar with R, here is a simplified version:
rand7 = function(){
r = 0
for(i in 0:6){
r = r + i + rand5() + rand5()*2 + rand5()*3 + rand5()*4 + rand5()*5 + rand5()*6
}
return r %% 7 + 1
}
The distribution of rand5
will be preserved. If we do the math, each of the 7 iterations of the loop has 5^6 possible combinations, thus total number of possible combinations are (7 * 5^6) %% 7 = 0
. Thus we can divide the random numbers generated in equal groups of 7. See method two for more discussion on this.
Here are all the possible combinations:
table(apply(expand.grid(c(outer(1:5,0:6,"+")),(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
15625 15625 15625 15625 15625 15625 15625
I think it's straight forward to show that Adam's method will run much much faster. The probability that there are 42 or more calls to rand5
in Adam's solution is very small ((4/25)^21 ~ 10^(-17)
).
Now the second method, which is almost uniform, but requires 6 calls to rand5
:
rand7 <- function() (sum(sapply(1:6,function(i) i*rand5())) %% 7) + 1
Here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:6){
r = r + i*rand5()
}
return r %% 7 + 1
}
This is essentially one iteration of method 1. If we generate all possible combinations, here is resulting counts:
table(apply(expand.grid(1:5,(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
2233 2232 2232 2232 2232 2232 2232
One number will appear once more in 5^6 = 15625
trials.
Now, in Method 1, by adding 1 to 6, we move the number 2233 to each of the successive point. Thus the total number of combinations will match up. This works because 5^6 %% 7 = 1, and then we do 7 appropriate variations, so (7 * 5^6 %% 7 = 0).
If the argument of method 1 and 2 is understood, method 3 follows, and requires only 7 calls to rand5
. At this point, I feel this is the minimum number of calls needed for an exact solution.
Here is an R implementation:
rand5 <- function() sample(1:5,1)
rand7 <- function() (sum(sapply(1:7, function(i) i * rand5())) %% 7) + 1
For people not familiar with R, here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:7){
r = r + i * rand5()
}
return r %% 7 + 1
}
The distribution of rand5
will be preserved. If we do the math, each of the 7 iterations of the loop has 5 possible outcomes, thus total number of possible combinations are (7 * 5) %% 7 = 0
. Thus we can divide the random numbers generated in equal groups of 7. See method one and two for more discussion on this.
Here are all the possible combinations:
table(apply(expand.grid(0:6,(1:5)),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
5 5 5 5 5 5 5
I think it's straight forward to show that Adam's method will still run faster. The probability that there are 7 or more calls to rand5
in Adam's solution is still small ((4/25)^3 ~ 0.004
).
This is a minor variation of the the second method. It is almost uniform, but requires 7 calls to rand5
, that is one additional to method 2:
rand7 <- function() (rand5() + sum(sapply(1:6,function(i) i*rand5())) %% 7) + 1
Here is a simplified version:
rand7 = function(){
r = 0
for(i in 1:6){
r = r + i*rand5()
}
return (r+rand5()) %% 7 + 1
}
If we generate all possible combinations, here is resulting counts:
table(apply(expand.grid(1:5,(1:5)*2,(1:5)*3,(1:5)*4,(1:5)*5,(1:5)*6,1:5),1,sum) %% 7 + 1)
1 2 3 4 5 6 7
11160 11161 11161 11161 11161 11161 11160
Two numbers will appear once less in 5^7 = 78125
trials. For most purposes, I can live with that.