Given an unknown length list, return a random item in it by scanning it only 1 time

前端未结

关注

 4  653

北海茫月

Given an unknown length list, return a random item in it by scanning it only 1 time.

My idea:

A similar algorithm is Reservoir Sampling (posted by others). But

相关标签:

4条回答

别跟我提以往

2021-01-23 06:21
Solution in C (implementation of reservoir sampling with k=1): You may want to use "unsigned long long" for count.
```
unsigned int chosen, num, count;
count = 1;
get_number(&chosen);
while (get_number(&num)) {
    count++;
    if (rand() % n == 0) chosen = num; 
}
```
Keeping k nodes per iteration is easy, when k=1.
It does run rand() each time, which is somewhat heavy. You may get reasonable results with a much simpler pseudo-random function. But this would make the code more complicated, not more simple.
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-01-23 06:25

Why are you against reservoir sampling? You happen to be doing it with k = 1. There are minor optimizations (e.g. you don't need to select 1 out of the k, since k = 1) but it's the right approach. You could try to optimize by keeping processing a fixed window at a time, do the math to figure out with equal probability if you should choose any of the items in your window instead of the one you have, etc. to minimize rand() calls at the expensive of a more complicated algorithm, but you're going to wind up back at reservoir sampling more or less anyhow.

0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2021-01-23 06:25

You use reservoir sampling.

This is not too complicated nor expensive; it is the minimal approach given the constraints that you have (selecting an element from a stream).

It works just fine if you want a random sample size of 1 and if all the elements have the same weight.

When you've simplified the code with a k of 1 and no explicit weighting, its still reservoir sampling.

Not all pseudo random number generators run at the same speed; pick a fast one.

Comments ask what would happen if you re-used the same random number rather than generating a new random number each step:

The Wikipedia link given shows the equivalence to the Yates-Fisher/Knuth shuffle. If you asked what would picking the same random number each step of the shuffle would be, you'd be barking.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2021-01-23 06:25

See the Perl cookbook for the algorithm, which you'll easily adapt to C++.

Basically, scan the list once, and for each entry with index i you read, keep it if a random number between 0 and i+1 is less than 1.

The result is the last entry kept.

0 讨论(0)
发布评论:

提交评论
- 加载中...