I have a set of values, and an associated percentage for each:
a: 70% chance
b: 20% chance
c: 10% chance
I want to select a value (a, b, c) based
today, the update of python document give an example to make a random.choice() with weighted probabilities:
If the weights are small integer ratios, a simple technique is to build a sample population with repeats:
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'
A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():
>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'
one note: itertools.accumulate() needs python 3.2 or define it with the Equivalent.
If you are really up to speed and want to generate the random values quickly, the Walker's algorithm mcdowella mentioned in https://stackoverflow.com/a/3655773/1212517 is pretty much the best way to go (O(1) time for random(), and O(N) time for preprocess()).
For anyone who is interested, here is my own PHP implementation of the algorithm:
/**
* Pre-process the samples (Walker's alias method).
* @param array key represents the sample, value is the weight
*/
protected function preprocess($weights){
$N = count($weights);
$sum = array_sum($weights);
$avg = $sum / (double)$N;
//divide the array of weights to values smaller and geq than sum/N
$smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller);
$greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);
$bin = array(); //bins
//we want to fill N bins
for($i = 0;$i<$N;$i++){
//At first, decide for a first value in this bin
//if there are small intervals left, we choose one
if($sN > 0){
$choice1 = each($smaller);
unset($smaller[$choice1['key']]);
$sN--;
} else{ //otherwise, we split a large interval
$choice1 = each($greater_eq);
unset($greater_eq[$choice1['key']]);
}
//splitting happens here - the unused part of interval is thrown back to the array
if($choice1['value'] >= $avg){
if($choice1['value'] - $avg >= $avg){
$greater_eq[$choice1['key']] = $choice1['value'] - $avg;
}else if($choice1['value'] - $avg > 0){
$smaller[$choice1['key']] = $choice1['value'] - $avg;
$sN++;
}
//this bin comprises of only one value
$bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
}else{
//make the second choice for the current bin
$choice2 = each($greater_eq);
unset($greater_eq[$choice2['key']]);
//splitting on the second interval
if($choice2['value'] - $avg + $choice1['value'] >= $avg){
$greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
}else{
$smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
$sN++;
}
//this bin comprises of two values
$choice2['value'] = $avg - $choice1['value'];
$bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
'p1'=>$choice1['value'] / $avg,
'p2'=>$choice2['value'] / $avg);
}
}
$this->bins = $bin;
}
/**
* Choose a random sample according to the weights.
*/
public function random(){
$bin = $this->bins[array_rand($this->bins)];
$randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];
}
import random
def selector(weights):
i=random.random()*sum(x for x,y in weights)
for w,v in weights:
if w>=i:
break
i-=w
return v
weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)]
it works equally well for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)]
If you have a lot of weights, you can use bisect to reduce the number of iterations required
import random
import bisect
def make_acc_weights(weights):
acc=0
acc_weights = []
for w,v in weights:
acc+=w
acc_weights.append((acc,v))
return acc_weights
def selector(acc_weights):
i=random.random()*sum(x for x,y in weights)
return weights[bisect.bisect(acc_weights, (i,))][1]
weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
Also works fine for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
Here is a complete solution in C#:
public class ProportionValue<T>
{
public double Proportion { get; set; }
public T Value { get; set; }
}
public static class ProportionValue
{
public static ProportionValue<T> Create<T>(double proportion, T value)
{
return new ProportionValue<T> { Proportion = proportion, Value = value };
}
static Random random = new Random();
public static T ChooseByRandom<T>(
this IEnumerable<ProportionValue<T>> collection)
{
var rnd = random.NextDouble();
foreach (var item in collection)
{
if (rnd < item.Proportion)
return item.Value;
rnd -= item.Proportion;
}
throw new InvalidOperationException(
"The proportions in the collection do not add up to 1.");
}
}
Usage:
var list = new[] {
ProportionValue.Create(0.7, "a"),
ProportionValue.Create(0.2, "b"),
ProportionValue.Create(0.1, "c")
};
// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());
For Python:
>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
...
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
...
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
...
a a a a c c a c a a c a
>>>
General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice
to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10
makes a 100-items list where 7, 2, 1
would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.
Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).
def weighted_choice(probabilities):
random_position = random.random() * sum(probabilities)
current_position = 0.0
for i, p in enumerate(probabilities):
current_position += p
if random_position < current_position:
return i
return None
Because random.random
will always return < 1.0, the final return
should never be reached.