selection based on percentage weighting

后端 未结 13 2059
忘掉有多难
忘掉有多难 2020-12-04 13:40

I have a set of values, and an associated percentage for each:

a: 70% chance
b: 20% chance
c: 10% chance

I want to select a value (a, b, c) based

相关标签:
13条回答
  • 2020-12-04 14:08

    today, the update of python document give an example to make a random.choice() with weighted probabilities:

    If the weights are small integer ratios, a simple technique is to build a sample population with repeats:

    >>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
    >>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
    >>> random.choice(population)
    'Green'
    

    A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():

    >>> choices, weights = zip(*weighted_choices)
    >>> cumdist = list(itertools.accumulate(weights))
    >>> x = random.random() * cumdist[-1]
    >>> choices[bisect.bisect(cumdist, x)]
    'Blue'
    

    one note: itertools.accumulate() needs python 3.2 or define it with the Equivalent.

    0 讨论(0)
  • 2020-12-04 14:09

    If you are really up to speed and want to generate the random values quickly, the Walker's algorithm mcdowella mentioned in https://stackoverflow.com/a/3655773/1212517 is pretty much the best way to go (O(1) time for random(), and O(N) time for preprocess()).

    For anyone who is interested, here is my own PHP implementation of the algorithm:

    /**
     * Pre-process the samples (Walker's alias method).
     * @param array key represents the sample, value is the weight
     */
    protected function preprocess($weights){
    
        $N = count($weights);
        $sum = array_sum($weights);
        $avg = $sum / (double)$N;
    
        //divide the array of weights to values smaller and geq than sum/N 
        $smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller); 
        $greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);
    
        $bin = array(); //bins
    
        //we want to fill N bins
        for($i = 0;$i<$N;$i++){
            //At first, decide for a first value in this bin
            //if there are small intervals left, we choose one
            if($sN > 0){  
                $choice1 = each($smaller); 
                unset($smaller[$choice1['key']]);
                $sN--;
            } else{  //otherwise, we split a large interval
                $choice1 = each($greater_eq); 
                unset($greater_eq[$choice1['key']]);
            }
    
            //splitting happens here - the unused part of interval is thrown back to the array
            if($choice1['value'] >= $avg){
                if($choice1['value'] - $avg >= $avg){
                    $greater_eq[$choice1['key']] = $choice1['value'] - $avg;
                }else if($choice1['value'] - $avg > 0){
                    $smaller[$choice1['key']] = $choice1['value'] - $avg;
                    $sN++;
                }
                //this bin comprises of only one value
                $bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
            }else{
                //make the second choice for the current bin
                $choice2 = each($greater_eq);
                unset($greater_eq[$choice2['key']]);
    
                //splitting on the second interval
                if($choice2['value'] - $avg + $choice1['value'] >= $avg){
                    $greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
                }else{
                    $smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
                    $sN++;
                }
    
                //this bin comprises of two values
                $choice2['value'] = $avg - $choice1['value'];
                $bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
                               'p1'=>$choice1['value'] / $avg, 
                               'p2'=>$choice2['value'] / $avg);
            }
        }
    
        $this->bins = $bin;
    }
    
    /**
     * Choose a random sample according to the weights.
     */
    public function random(){
        $bin = $this->bins[array_rand($this->bins)];
        $randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];        
    }
    
    0 讨论(0)
  • 2020-12-04 14:10
    import random
    
    def selector(weights):
        i=random.random()*sum(x for x,y in weights)
        for w,v in weights:
            if w>=i:
                break
            i-=w
        return v
    
    weights = ((70,'a'),(20,'b'),(10,'c'))
    print [selector(weights) for x in range(10)] 
    

    it works equally well for fractional weights

    weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
    print [selector(weights) for x in range(10)] 
    

    If you have a lot of weights, you can use bisect to reduce the number of iterations required

    import random
    import bisect
    
    def make_acc_weights(weights):
        acc=0
        acc_weights = []
        for w,v in weights:
            acc+=w
            acc_weights.append((acc,v))
        return acc_weights
    
    def selector(acc_weights):
        i=random.random()*sum(x for x,y in weights)
        return weights[bisect.bisect(acc_weights, (i,))][1]
    
    weights = ((70,'a'),(20,'b'),(10,'c'))
    acc_weights = make_acc_weights(weights)    
    print [selector(acc_weights) for x in range(100)]
    

    Also works fine for fractional weights

    weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
    acc_weights = make_acc_weights(weights)    
    print [selector(acc_weights) for x in range(100)]
    
    0 讨论(0)
  • 2020-12-04 14:11

    Here is a complete solution in C#:

    public class ProportionValue<T>
    {
        public double Proportion { get; set; }
        public T Value { get; set; }
    }
    
    public static class ProportionValue
    {
        public static ProportionValue<T> Create<T>(double proportion, T value)
        {
            return new ProportionValue<T> { Proportion = proportion, Value = value };
        }
    
        static Random random = new Random();
        public static T ChooseByRandom<T>(
            this IEnumerable<ProportionValue<T>> collection)
        {
            var rnd = random.NextDouble();
            foreach (var item in collection)
            {
                if (rnd < item.Proportion)
                    return item.Value;
                rnd -= item.Proportion;
            }
            throw new InvalidOperationException(
                "The proportions in the collection do not add up to 1.");
        }
    }
    

    Usage:

    var list = new[] {
        ProportionValue.Create(0.7, "a"),
        ProportionValue.Create(0.2, "b"),
        ProportionValue.Create(0.1, "c")
    };
    
    // Outputs "a" with probability 0.7, etc.
    Console.WriteLine(list.ChooseByRandom());
    
    0 讨论(0)
  • 2020-12-04 14:11

    For Python:

    >>> import random
    >>> dst = 70, 20, 10
    >>> vls = 'a', 'b', 'c'
    >>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
    >>> for _ in range(12): print random.choice(picks),
    ... 
    a c c b a a a a a a a a
    >>> for _ in range(12): print random.choice(picks),
    ... 
    a c a c a b b b a a a a
    >>> for _ in range(12): print random.choice(picks),
    ... 
    a a a a c c a c a a c a
    >>> 
    

    General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10 makes a 100-items list where 7, 2, 1 would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.

    Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).

    0 讨论(0)
  • 2020-12-04 14:18
    def weighted_choice(probabilities):
        random_position = random.random() * sum(probabilities)
        current_position = 0.0
        for i, p in enumerate(probabilities):
            current_position += p
            if random_position < current_position:
                return i
        return None
    

    Because random.random will always return < 1.0, the final return should never be reached.

    0 讨论(0)
提交回复
热议问题