I want to generate random number in sorted order. I wrote below code:
void CreateSortedNode(pNode head)
{
int size = 10, last = 0;
pNode temp;
while(
Without any information about sample size or sample universe, it's not easy to know if the following is interesting but irrelevant or a solution, but since it is in any case interesting, here goes.
The problem:
In O(1)
space, produce an unbiased ordered random sample of size n
from an ordered set S
of size N
: <S1,S2,…SN>
, such that the elements in the sample are in the same order as the elements in the ordered set.
The solution:
With probability n/|S|
, do the following:
add S1
to the sample.
decrement n
Remove S1
from S
Repeat steps 1 and 2, each time with the new first element (and size) of S
until n
is 0, at which point the sample will have the desired number of elements.
The solution in python:
from random import randrange
# select n random integers in order from range(N)
def sample(n, N):
# insist that 0 <= n <= N
for i in range(N):
if randrange(N - i) < n:
yield i
n -= 1
if n <= 0:
break
The problem with the solution:
It takes O(N)
time. We'd really like to take O(n)
time, since n
is likely to be much smaller than N
. On the other hand, we'd like to retain the O(1)
space, in case n
is also quite large.
A better solution (outline only)
(The following is adapted from a 1987 paper by Jeffrey Scott Visser, "An Efficient Algorithm for Sequential Random Sampling", which thanks to the generosity of Dr. Visser is available freely from the ACM digital library. See Dr. Visser's publications page.. Please read the paper for the details.)
Instead of incrementing i
and selecting a random number, as in the above python code, it would be cool if we could generate a random number according to some distribution which would be the number of times that i
will be incremented without any element being yielded. All we need is the distribution (which will obviously depend on the current values of n
and N
.)
Of course, we can derive the distribution precisely from an examination of the algorithm. That doesn't help much, though, because the resulting formula requires a lot of time to compute accurately, and the end result is still O(N)
.
However, we don't always have to compute it accurately. Suppose we have some easily computable reasonably good approximation which consistently underestimates the probabilities (with the consequence that it will sometimes not make a prediction). If that approximation works, we can use it; if not, we'll need to fallback to the accurate computation. If that happens sufficiently rarely, we might be able to achieve O(n)
on the average. And indeed, Dr. Visser's paper shows how to do this. (With code.)
Suppose you wanted to generate just three random numbers, x
, y
, and z
so that they are in sorted order x <= y <= z
. You will place these in some C++ container, which I'll just denote as a list like D = [x, y, z]
, so we can also say that x
is component 0 of D
, or D_0
and so on.
For any sequential algorithm that first draws a random value for x
, let's say it comes up with 2.5, then this tells us some information about what y
has to be, Namely, y >= 2.5
.
So, conditional on the value of x
, your desired random number algorithm has to satisfy the property that p(y >= x | x) = 1
. If the distribution you are drawing from is anything like a common distribution, like uniform or Guassian, then it's clear to see that usually p(y >= x)
would be some other expression involving the density for that distribution. (In fact, only a pathological distribution like a Dirac Delta at "infinity" could be independent, and would be nonsense for your application.)
So what we can speculate with great confidence is that p(y >= t | x)
for various values of t
is not equal to p(y >= t)
. That's the definition for dependent random variables. So now you know that the random variable y
(second in your eventual list) is not statistically independent of x
.
Another way to state it is that in your output data D
, the components of D
are not statistically independent observations. And in fact they must be positively correlated since if we learn that x
is bigger than we thought, we also automatically learn that y
is bigger than or equal to what we thought.
In this sense, a sequential algorithm that provides this kind of output is an example of a Markov Chain. The probability distribution of a given number in the sequence is conditionally dependent on the previous number.
If you really want a Markov Chain like that (I suspect that you don't), then you could instead draw a first number at random (for x
) and then draw positive deltas, which you will add to each successive number, like this:
x
, say 2.5y-x
, say 13.7, so y
is 2.5 + 13.7 = 16.2z-y
, say 0.001, so z
is 16.201You just have to acknowledge that the components of your result are not statistically independent, and so you cannot use them in an application that relies on statistical independence assumptions.