This question on getting random values from a finite set got me thinking...
It\'s fairly common for people to want to retrieve X unique values from a set of Y values.
Most people forget that looking up, if the number has already run, also takes a while.
The number of tries nessesary can, as descriped earlier, be evaluated from:
T(n,m) = n(H(n)-H(n-m)) ⪅ n(ln(n)-ln(n-m))
which goes to n*ln(n)
for interesting values of m
However, for each of these 'tries' you will have to do a lookup. This might be a simple O(n)
runthrough, or something like a binary tree. This will give you a total performance of n^2*ln(n)
or n*ln(n)^2
.
For smaller values of m
(m < n/2
), you can do a very good approximation for T(n,m)
using the HA
-inequation, yielding the formula:
2*m*n/(2*n-m+1)
As m
goes to n
, this gives a lower bound of O(n)
tries and performance O(n^2)
or O(n*ln(n))
.
All the results are however far better, that I would ever have expected, which shows that the algorithm might actually be just fine in many non critical cases, where you can accept occasional longer running times (when you are unlucky).
Your actual question is actually a lot more interesting than what I answered (and harder). I've never been any good at statistitcs (and it's been a while since I did any), but intuitively, I'd say that the run-time complexity of that algorithm would probably something like an exponential. As long as the number of elements picked is small enough compared to the size of the array the collision-rate will be so small that it will be close to linear time, but at some point the number of collisions will probably grow fast and the run-time will go down the drain.
If you want to prove this, I think you'd have to do something moderately clever with the expected number of collisions in function of the wanted number of elements. It might be possible do to by induction as well, but I think going by that route would require more cleverness than the first alternative.
EDIT: After giving it some thought, here's my attempt:
Given an array of m
elements, and looking for n
random and different elements. It is then easy to see that when we want to pick the i
th element, the odds of picking an element we've already visited are (i-1)/m
. This is then the expected number of collisions for that particular pick. For picking n
elements, the expected number of collisions will be the sum of the number of expected collisions for each pick. We plug this into Wolfram Alpha (sum (i-1)/m, i=1 to n) and we get the answer (n**2 - n)/2m
. The average number of picks for our naive algorithm is then n + (n**2 - n)/2m
.
Unless my memory fails me completely (which entirely possible, actually), this gives an average-case run-time O(n**2)
.