I want to choose k
elements uniformly at random out of a possible n
without choosing the same number twice. There are two trivial approaches to this.
Your second approach does not take Theta(k log k) time on average, it takes about n/(n-k+1) + n/(n-k+2) + ... + n/n operations, which is less than k(n/(n-k)) since you have k terms which are each smaller than n/(n-k). For k <= n/2, it takes under 2*k operations on average. For k>n/2, you can choose a random subset of size n-k, and take the complement. So, this is already an O(k) average time and space algorithm.
With an O(1) hash table, the partial Fisher-Yates method can be made to run in O(k) time and space. The trick is simply to store only the changed elements of the array in the hash table.
Here's a simple example in Java:
public static int[] getRandomSelection (int k, int n, Random rng) {
if (k > n) throw new IllegalArgumentException(
"Cannot choose " + k + " elements out of " + n + "."
);
HashMap<Integer, Integer> hash = new HashMap<Integer, Integer>(2*k);
int[] output = new int[k];
for (int i = 0; i < k; i++) {
int j = i + rng.nextInt(n - i);
output[i] = (hash.containsKey(j) ? hash.remove(j) : j);
if (j > i) hash.put(j, (hash.containsKey(i) ? hash.remove(i) : i));
}
return output;
}
This code allocates a HashMap of 2×k buckets to store the modified elements (which should be enough to ensure that the hash table is never rehashed), and just runs a partial Fisher-Yates shuffle on it.
Here's a quick test on Ideone; it picks two elements out of three 30,000 times, and counts the number of times each pair of elements gets chosen. For an unbiased shuffle, each ordered pair should appear approximately 5,000 (±100 or so) times, except for the impossible cases where both elements would be equal.
What you could use is the following algorithm (using javascript instead of pseudocode):
var k = 3;
var n = [1,2,3,4,5,6];
// O(k) iterations
for(var i = 0, tmp; i < k; ++i) {
// Random index O(1)
var index = Math.floor(Math.random() * (n.length - i));
// Output O(1)
console.log(n[index]);
// Swap and lookup O(1)
tmp = n[index];
n[index] = n[n.length - i - 1];
n[n.length - i - 1] = tmp;
}
In short, you swap the selected value with the last item and in the next iteration sample from the reduced subset. This assumes your original set is wholly unique.
The storage is O(n), if you wish to retrieve the numbers as a set, just refer to the last k entries from n.