I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:
It depends on how many times you want to sample the distribution.
Suppose you want to sample the distribution K times. Then, the time complexity using np.random.choice()
each time is O(K(n + log(n)))
when n
is the number of items in the distribution.
In my case, I needed to sample the same distribution multiple times of the order of 10^3 where n is of the order of 10^6. I used the below code, which precomputes the cumulative distribution and samples it in O(log(n))
. Overall time complexity is O(n+K*log(n))
.
import numpy as np
n,k = 10**6,10**3
# Create dummy distribution
a = np.array([i+1 for i in range(n)])
p = np.array([1.0/n]*n)
cfd = p.cumsum()
for _ in range(k):
x = np.random.uniform()
idx = cfd.searchsorted(x, side='right')
sampled_element = a[idx]