A weighted version of random.choice

后端未结

关注

 25  1942

I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:

相关标签:

25条回答

渐次进展

2020-11-21 06:41
If your list of weighted choices is relatively static, and you want frequent sampling, you can do one O(N) preprocessing step, and then do the selection in O(1), using the functions in this related answer.
```
# run only when `choices` changes.
preprocessed_data = prep(weight for _,weight in choices)

# O(1) selection
value = choices[sample(preprocessed_data)][0]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-11-21 06:43
Since Python 3.6 there is a method choices from the random module.
```
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import random

In [2]: random.choices(
...:     population=[['a','b'], ['b','a'], ['c','b']],
...:     weights=[0.2, 0.2, 0.6],
...:     k=10
...: )

Out[2]:
[['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['b', 'a'],
 ['c', 'b'],
 ['c', 'b']]
```
Note that random.choices will sample with replacement, per the docs:

Return a k sized list of elements chosen from the population with replacement.

Note for completeness of answer:

When a sampling unit is drawn from a finite population and is returned to that population, after its characteristic(s) have been recorded, before the next unit is drawn, the sampling is said to be "with replacement". It basically means each element may be chosen more than once.

If you need to sample without replacement, then as @ronan-paixão's brilliant answer states, you can use numpy.choice, whose replace argument controls such behaviour.
0 讨论(0)
发布评论:

提交评论
- 加载中...

臣服心动

2020-11-21 06:43

Using numpy

def choice(items, weights):
    return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]

0 讨论(0)

Happy的楠姐

2020-11-21 06:44

Provide random.choice() with a pre-weighted list:

Solution & Test:

import random

options = ['a', 'b', 'c', 'd']
weights = [1, 2, 5, 2]

weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
weighted_options = [opt for sublist in weighted_options for opt in sublist]
print(weighted_options)

# test

counts = {c: 0 for c in options}
for x in range(10000):
    counts[random.choice(weighted_options)] += 1

for opt, wgt in zip(options, weights):
    wgt_r = counts[opt] / 10000 * sum(weights)
    print(opt, counts[opt], wgt, wgt_r)

Output:

['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
a 1025 1 1.025
b 1948 2 1.948
c 5019 5 5.019
d 2008 2 2.008

0 讨论(0)

悲哀的现实

2020-11-21 06:46

One way is to randomize on the total of all the weights and then use the values as the limit points for each var. Here is a crude implementation as a generator.

def rand_weighted(weights):
    """
    Generator which uses the weights to generate a
    weighted random values
    """
    sum_weights = sum(weights.values())
    cum_weights = {}
    current_weight = 0
    for key, value in sorted(weights.iteritems()):
        current_weight += value
        cum_weights[key] = current_weight
    while True:
        sel = int(random.uniform(0, 1) * sum_weights)
        for key, value in sorted(cum_weights.iteritems()):
            if sel < value:
                break
        yield key

0 讨论(0)

失恋的感觉

2020-11-21 06:47
As of Python v3.6, random.choices could be used to return a list of elements of specified size from the given population with optional weights.

random.choices(population, weights=None, *, cum_weights=None, k=1)
- population : list containing unique observations. (If empty, raises IndexError)
- weights : More precisely relative weights required to make selections.
- cum_weights : cumulative weights required to make selections.
- k : size(len) of the list to be outputted. (Default len()=1)
Few Caveats:

1) It makes use of weighted sampling with replacement so the drawn items would be later replaced. The values in the weights sequence in itself do not matter, but their relative ratio does.

Unlike np.random.choice which can only take on probabilities as weights and also which must ensure summation of individual probabilities upto 1 criteria, there are no such regulations here. As long as they belong to numeric types (int/float/fraction except Decimal type) , these would still perform.
```
>>> import random
# weights being integers
>>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
# weights being floats
>>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
# weights being fractions
>>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']
```
2) If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence.

Specifying both weights and cum_weights raises a TypeError.
```
>>> random.choices(["white", "green", "red"], k=10)
['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']
```
3) cum_weights are typically a result of itertools.accumulate function which are really handy in such situations.

_{From the documentation linked:}

Internally, the relative weights are converted to cumulative weights before making selections, so supplying the cumulative weights saves work.

So, either supplying weights=[12, 12, 4] or cum_weights=[12, 24, 28] for our contrived case produces the same outcome and the latter seems to be more faster / efficient.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 5 下一页