selection based on percentage weighting

后端未结

关注

 13  2059

忘掉有多难

I have a set of values, and an associated percentage for each:

a: 70% chance
b: 20% chance
c: 10% chance

I want to select a value (a, b, c) based

相关标签:

13条回答

南方客

2020-12-04 14:08
today, the update of python document give an example to make a random.choice() with weighted probabilities:

If the weights are small integer ratios, a simple technique is to build a sample population with repeats:
```
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'
```
A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():
```
>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'
```
one note: itertools.accumulate() needs python 3.2 or define it with the Equivalent.
0 讨论(0)
发布评论:

提交评论
- 加载中...

星月不相逢

2020-12-04 14:09

If you are really up to speed and want to generate the random values quickly, the Walker's algorithm mcdowella mentioned in https://stackoverflow.com/a/3655773/1212517 is pretty much the best way to go (O(1) time for random(), and O(N) time for preprocess()).

For anyone who is interested, here is my own PHP implementation of the algorithm:

/**
 * Pre-process the samples (Walker's alias method).
 * @param array key represents the sample, value is the weight
 */
protected function preprocess($weights){

    $N = count($weights);
    $sum = array_sum($weights);
    $avg = $sum / (double)$N;

    //divide the array of weights to values smaller and geq than sum/N 
    $smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller); 
    $greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);

    $bin = array(); //bins

    //we want to fill N bins
    for($i = 0;$i<$N;$i++){
        //At first, decide for a first value in this bin
        //if there are small intervals left, we choose one
        if($sN > 0){  
            $choice1 = each($smaller); 
            unset($smaller[$choice1['key']]);
            $sN--;
        } else{  //otherwise, we split a large interval
            $choice1 = each($greater_eq); 
            unset($greater_eq[$choice1['key']]);
        }

        //splitting happens here - the unused part of interval is thrown back to the array
        if($choice1['value'] >= $avg){
            if($choice1['value'] - $avg >= $avg){
                $greater_eq[$choice1['key']] = $choice1['value'] - $avg;
            }else if($choice1['value'] - $avg > 0){
                $smaller[$choice1['key']] = $choice1['value'] - $avg;
                $sN++;
            }
            //this bin comprises of only one value
            $bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
        }else{
            //make the second choice for the current bin
            $choice2 = each($greater_eq);
            unset($greater_eq[$choice2['key']]);

            //splitting on the second interval
            if($choice2['value'] - $avg + $choice1['value'] >= $avg){
                $greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
            }else{
                $smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
                $sN++;
            }

            //this bin comprises of two values
            $choice2['value'] = $avg - $choice1['value'];
            $bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
                           'p1'=>$choice1['value'] / $avg, 
                           'p2'=>$choice2['value'] / $avg);
        }
    }

    $this->bins = $bin;
}

/**
 * Choose a random sample according to the weights.
 */
public function random(){
    $bin = $this->bins[array_rand($this->bins)];
    $randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];        
}

0 讨论(0)

广开言路

2020-12-04 14:10

import random

def selector(weights):
    i=random.random()*sum(x for x,y in weights)
    for w,v in weights:
        if w>=i:
            break
        i-=w
    return v

weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)]

it works equally well for fractional weights

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)]

If you have a lot of weights, you can use bisect to reduce the number of iterations required

import random
import bisect

def make_acc_weights(weights):
    acc=0
    acc_weights = []
    for w,v in weights:
        acc+=w
        acc_weights.append((acc,v))
    return acc_weights

def selector(acc_weights):
    i=random.random()*sum(x for x,y in weights)
    return weights[bisect.bisect(acc_weights, (i,))][1]

weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

Also works fine for fractional weights

weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)    
print [selector(acc_weights) for x in range(100)]

0 讨论(0)

無奈伤痛

2020-12-04 14:11

Here is a complete solution in C#:

public class ProportionValue<T>
{
    public double Proportion { get; set; }
    public T Value { get; set; }
}

public static class ProportionValue
{
    public static ProportionValue<T> Create<T>(double proportion, T value)
    {
        return new ProportionValue<T> { Proportion = proportion, Value = value };
    }

    static Random random = new Random();
    public static T ChooseByRandom<T>(
        this IEnumerable<ProportionValue<T>> collection)
    {
        var rnd = random.NextDouble();
        foreach (var item in collection)
        {
            if (rnd < item.Proportion)
                return item.Value;
            rnd -= item.Proportion;
        }
        throw new InvalidOperationException(
            "The proportions in the collection do not add up to 1.");
    }
}

Usage:

var list = new[] {
    ProportionValue.Create(0.7, "a"),
    ProportionValue.Create(0.2, "b"),
    ProportionValue.Create(0.1, "c")
};

// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());

0 讨论(0)

滥情空心

2020-12-04 14:11
For Python:
```
>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
... 
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
... 
a a a a c c a c a a c a
>>> 
```
General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10 makes a 100-items list where 7, 2, 1 would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.

Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).
0 讨论(0)
发布评论:

提交评论
- 加载中...

一生所求

2020-12-04 14:18

def weighted_choice(probabilities):
    random_position = random.random() * sum(probabilities)
    current_position = 0.0
    for i, p in enumerate(probabilities):
        current_position += p
        if random_position < current_position:
            return i
    return None

Because random.random will always return < 1.0, the final return should never be reached.

0 讨论(0)