I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:
A very basic and easy approach for a weighted choice is the following:
np.random.choice(['A', 'B', 'C'], p=[0.3, 0.4, 0.3])
I'm probably too late to contribute anything useful, but here's a simple, short, and very efficient snippet:
def choose_index(probabilies):
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]
No need to sort your probabilities or create a vector with your cmf, and it terminates once it finds its choice. Memory: O(1), time: O(N), with average running time ~ N/2.
If you have weights, simply add one line:
def choose_index(weights):
probabilities = weights / sum(weights)
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]
It depends on how many times you want to sample the distribution.
Suppose you want to sample the distribution K times. Then, the time complexity using np.random.choice()
each time is O(K(n + log(n)))
when n
is the number of items in the distribution.
In my case, I needed to sample the same distribution multiple times of the order of 10^3 where n is of the order of 10^6. I used the below code, which precomputes the cumulative distribution and samples it in O(log(n))
. Overall time complexity is O(n+K*log(n))
.
import numpy as np
n,k = 10**6,10**3
# Create dummy distribution
a = np.array([i+1 for i in range(n)])
p = np.array([1.0/n]*n)
cfd = p.cumsum()
for _ in range(k):
x = np.random.uniform()
idx = cfd.searchsorted(x, side='right')
sampled_element = a[idx]
A general solution:
import random
def weighted_choice(choices, weights):
total = sum(weights)
treshold = random.uniform(0, total)
for k, weight in enumerate(weights):
total -= weight
if total < treshold:
return choices[k]
Another way of doing this, assuming we have weights at the same index as the elements in the element array.
import numpy as np
weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
# sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
trials = 1 #number of trials
num_item = 1 #number of items that can be picked in each trial
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# gives number of times an item was selected at a particular index
# this assumes selection with replacement
# one possible output
# selected_item_arr
# array([[0, 0, 1]])
# say if trials = 5, the the possible output could be
# selected_item_arr
# array([[1, 0, 0],
# [0, 0, 1],
# [0, 0, 1],
# [0, 1, 0],
# [0, 0, 1]])
Now let's assume, we have to sample out 3 items in 1 trial. You can assume that there are three balls R,G,B present in large quantity in ratio of their weights given by weight array, the following could be possible outcome:
num_item = 3
trials = 1
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# selected_item_arr can give output like :
# array([[1, 0, 2]])
you can also think number of items to be selected as number of binomial/ multinomial trials within a set. So, the above example can be still work as
num_binomial_trial = 5
weights = [0.1,0.9] #say an unfair coin weights for H/T
num_experiment_set = 1
selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
# possible output
# selected_item_arr
# array([[1, 4]])
# i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.
There is lecture on this by Sebastien Thurn in the free Udacity course AI for Robotics. Basically he makes a circular array of the indexed weights using the mod operator %
, sets a variable beta to 0, randomly chooses an index,
for loops through N where N is the number of indices and in the for loop firstly increments beta by the formula:
beta = beta + uniform sample from {0...2* Weight_max}
and then nested in the for loop, a while loop per below:
while w[index] < beta:
beta = beta - w[index]
index = index + 1
select p[index]
Then on to the next index to resample based on the probabilities (or normalized probability in the case presented in the course).
The lecture link: https://classroom.udacity.com/courses/cs373/lessons/48704330/concepts/487480820923
I am logged into Udacity with my school account so if the link does not work, it is Lesson 8, video number 21 of Artificial Intelligence for Robotics where he is lecturing on particle filters.