问题
I want to run a relatively simple random draw in numpy, but I can't find a good way to express it. I think the best way is to describe it as drawing from an urn without replacement. I have an urn with k colors, and n_k balls of every color. I want to draw m balls, and know how many balls of every color I have.
My current attempt it
np.bincount(np.random.permutation(np.repeat(np.arange(k), n_k))[:m], minlength=k)
here, n_k
is an array of length k with the counts of the balls.
It seems that's equivalent to
np.bincount(np.random.choice(k, m, n_k / n_k.sum(), minlength=k)
which is a bit better, but still not great.
回答1:
What you want is an implementation of the multivariate hypergeometric distribution. I don't know of one in numpy or scipy, but it might already exist out there somewhere.
You can implement it using repeated calls to numpy.random.hypergeometric. Whether that will be more efficient than your implementation depends on how many colors there are and how many balls of each color.
For example, here's a script that prints the result of drawing from an urn containing three colors (red, green and blue):
from __future__ import print_function
import numpy as np
nred = 12
ngreen = 4
nblue = 18
m = 15
red = np.random.hypergeometric(nred, ngreen + nblue, m)
green = np.random.hypergeometric(ngreen, nblue, m - red)
blue = m - (red + green)
print("red: %2i" % red)
print("green: %2i" % green)
print("blue: %2i" % blue)
Sample output:
red: 6
green: 1
blue: 8
The following function generalizes that to choosing m
balls given an array colors
holding the number of each color:
def sample(m, colors):
"""
Parameters
----------
m : number balls to draw from the urn
colors : one-dimensional array of number balls of each color in the urn
Returns
-------
One-dimensional array with the same length as `colors` containing the
number of balls of each color in a random sample.
"""
remaining = np.cumsum(colors[::-1])[::-1]
result = np.zeros(len(colors), dtype=np.int)
for i in range(len(colors)-1):
if m < 1:
break
result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)
m -= result[i]
result[-1] = m
return result
For example,
>>> sample(10, [2, 4, 8, 16])
array([2, 3, 1, 4])
回答2:
The following should work:
def make_sampling_arr(n_k):
out = [ x for s in [ [i] * n_k[i] for i in range(len(n_k)) ] for x in s ]
return out
np.random.choice(make_sampling_arr(n_k), m)
来源:https://stackoverflow.com/questions/35734026/numpy-drawing-from-urn