问题
I have an array of element probabilities, let's say [0.1, 0.2, 0.5, 0.2]
. The array sums up to 1.0.
Using plain Python or numpy, I want to draw elements proportional to their probability: the first element about 10% of the time, second 20%, third 50% etc. The "draw" should return index of the element drawn.
I came up with this:
def draw(probs):
cumsum = numpy.cumsum(probs / sum(probs)) # sum up to 1.0, just in case
return len(numpy.where(numpy.random.rand() >= cumsum)[0])
It works, but it's too convoluted, there must be a better way. Thanks.
回答1:
import numpy as np
def random_pick(choices, probs):
'''
>>> a = ['Hit', 'Out']
>>> b = [.3, .7]
>>> random_pick(a,b)
'''
cutoffs = np.cumsum(probs)
idx = cutoffs.searchsorted(np.random.uniform(0, cutoffs[-1]))
return choices[idx]
How it works:
In [22]: import numpy as np
In [23]: probs = [0.1, 0.2, 0.5, 0.2]
Compute the cumulative sum:
In [24]: cutoffs = np.cumsum(probs)
In [25]: cutoffs
Out[25]: array([ 0.1, 0.3, 0.8, 1. ])
Compute a uniformly distributed random number in the half-open interval [0, cutoffs[-1])
:
In [26]: np.random.uniform(0, cutoffs[-1])
Out[26]: 0.9723114393023948
Use searchsorted to find the index where the random number would be inserted into cutoffs
:
In [27]: cutoffs.searchsorted(0.9723114393023948)
Out[27]: 3
Return choices[idx]
, where idx
is that index.
回答2:
You want to sample from the categorical distribution, which is not implemented in numpy. However, the multinomial distribution is a generalization of the categorical distribution and can be used for that purpose.
>>> import numpy as np
>>>
>>> def sampleCategory(p):
... return np.flatnonzero( np.random.multinomial(1,p,1) )[0]
...
>>> sampleCategory( [0.1,0.5,0.4] )
1
回答3:
use numpy.random.multinomial
- most efficient
回答4:
I've never used numpy, but I assume my code below (python only) does the same thing as what you accomplished in one line. I'm putting it here just in case you want it.
Looks very c-ish so apologies for not being very pythonic.
weight_total would be 1 for you.
def draw(probs)
r = random.randrange(weight_total)
running_total = 0
for i, p in enumerate(probs)
running_total += p
if running_total > r:
return i
回答5:
use bisect
import bisect
import random
import numpy
def draw(probs):
cumsum=numpy.cumsum(probs/sum(probs))
return bisect.bisect_left(cumsum, numpy.random.rand())
should do the trick.
来源:https://stackoverflow.com/questions/8973350/draw-random-element-in-numpy