Choose random array element satisfying certain property

后端未结

关注

 5  1482

Suppose I have a list, called elements, each of which does or does not satisfy some boolean property p. I want to choose one of the elements that

相关标签:

5条回答

无人及你

2020-12-16 16:09

It works mathematically. Can be proven by induction.

Clearly works for n = 1 element satisfying p.

For n+1 elements, we will choose the element n+1 with probability 1/(n+1), so its probability is OK. But how does that effect the end probability of choosing one of the prior n elements?

Each of the prior n had a chance of being selected with probability 1/n, until we found element n+1. Now, after finding n+1, there is a 1/(n+1) chance that element n+1 is chosen, so there is a n/(n+1) chance that the previously chosen element remains chosen. Which means its final probability of being the chosen after n+1 finds is 1/n * (n/n+1) = 1/n+1 -- which is the probability we want for all n+1 elements for uniform distribution.

If it works for n = 1, and it works for n+1 given n, then it works for all n.

0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2020-12-16 16:13
For clarity's sake, I would try:
```
pickRandElement(elements, p)
     OrderedCollection coll = new OrderedCollection
     foreach element in elements
          if (p(element))
               coll.add(element)
     if (coll.size == 0) return null
     else return coll.get(randInt(coll.size))
```
To me, that makes it MUCH clearer what you're trying to do and is self-documenting. On top of that, it's simpler and more elegant, and it's now obvious that each will be picked with an even distribution.
0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-16 16:19
In The Practice of Programming, pg. 70, (The Markov Chain Algorithm) there is a similar algorithm for that:
```
[...]
  nmatch = 0;
  for ( /* iterate list */ )
    if (rand() % ++nmatch == 0) /* prob = 1/nmatch */
      w = suf->word;
[...]
```
"Notice the algorithm for selecting one item at random when we don't know how many items there are. The variable nmatch counts the number of matches as the list is scanned. The expression
```
rand() % ++nmatch == 0
```
increments nmatch and is then true with probability 1/nmatch."
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2020-12-16 16:27

decowboy has a nice proof that this works on TopCoder

0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-12-16 16:31
Yes, I believe so.

The first time you encounter a matching element, you definitely pick it. The next time, you pick the new value with a probability of 1/2, so each of the two elements have an equal chance. The following time, you pick the new value with a probability of 1/3, leaving each of the other elements with a probability of 1/2 * 2/3 = 1/3 as well.

I'm trying to find a Wikipedia article about this strategy, but failing so far...

Note that more generally, you're just picking a random sample out of a sequence of unknown length. Your sequence happens to be generated by taking an initial sequence and filtering it, but the algorithm doesn't require that part at all.

I thought I'd got a LINQ operator in MoreLINQ to do this, but I can't find it in the repository... EDIT: Fortunately, it still exists from this answer:
```
public static T RandomElement<T>(this IEnumerable<T> source,
                                 Random rng)
{
    T current = default(T);
    int count = 0;
    foreach (T element in source)
    {
        count++;
        if (rng.Next(count) == 0)
        {
            current = element;
        }            
    }
    if (count == 0)
    {
        throw new InvalidOperationException("Sequence was empty");
    }
    return current;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...