Given an unknown length list, return a random item in it by scanning it only 1 time

前端 未结 4 614
北海茫月
北海茫月 2021-01-23 05:37

Given an unknown length list, return a random item in it by scanning it only 1 time.

My idea:

A similar algorithm is Reservoir Sampling (posted by others). But

相关标签:
4条回答
  • 2021-01-23 06:21

    Solution in C (implementation of reservoir sampling with k=1): You may want to use "unsigned long long" for count.

    unsigned int chosen, num, count;
    count = 1;
    get_number(&chosen);
    while (get_number(&num)) {
        count++;
        if (rand() % n == 0) chosen = num; 
    }
    

    Keeping k nodes per iteration is easy, when k=1.
    It does run rand() each time, which is somewhat heavy. You may get reasonable results with a much simpler pseudo-random function. But this would make the code more complicated, not more simple.

    0 讨论(0)
  • 2021-01-23 06:25

    Why are you against reservoir sampling? You happen to be doing it with k = 1. There are minor optimizations (e.g. you don't need to select 1 out of the k, since k = 1) but it's the right approach. You could try to optimize by keeping processing a fixed window at a time, do the math to figure out with equal probability if you should choose any of the items in your window instead of the one you have, etc. to minimize rand() calls at the expensive of a more complicated algorithm, but you're going to wind up back at reservoir sampling more or less anyhow.

    0 讨论(0)
  • 2021-01-23 06:25

    You use reservoir sampling.

    This is not too complicated nor expensive; it is the minimal approach given the constraints that you have (selecting an element from a stream).

    It works just fine if you want a random sample size of 1 and if all the elements have the same weight.

    When you've simplified the code with a k of 1 and no explicit weighting, its still reservoir sampling.

    Not all pseudo random number generators run at the same speed; pick a fast one.


    Comments ask what would happen if you re-used the same random number rather than generating a new random number each step:

    The Wikipedia link given shows the equivalence to the Yates-Fisher/Knuth shuffle. If you asked what would picking the same random number each step of the shuffle would be, you'd be barking.

    0 讨论(0)
  • 2021-01-23 06:25

    See the Perl cookbook for the algorithm, which you'll easily adapt to C++.

    Basically, scan the list once, and for each entry with index i you read, keep it if a random number between 0 and i+1 is less than 1.

    The result is the last entry kept.

    0 讨论(0)
提交回复
热议问题