What is the efficient(probably vectorized with Matlab terminology) way to generate random number of zeros and ones with a specific proportion? Specially with Numpy?
Another way of getting the exact number of ones and zeroes is to sample indices without replacement using np.random.choice
:
arr_len = 30
num_ones = 8
arr = np.zeros(arr_len, dtype=int)
idx = np.random.choice(range(arr_len), num_ones, replace=False)
arr[idx] = 1
Out:
arr
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 1, 0, 0])
If I understand your problem correctly, you might get some help with numpy.random.shuffle
>>> def rand_bin_array(K, N):
arr = np.zeros(N)
arr[:K] = 1
np.random.shuffle(arr)
return arr
>>> rand_bin_array(5,15)
array([ 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0., 0., 0.,
0., 0.])
Yet another approach, using np.random.choice:
>>> np.random.choice([0, 1], size=(10,), p=[1./3, 2./3])
array([0, 1, 1, 1, 1, 0, 0, 0, 0, 0])
You can use numpy.random.binomial
. E.g. suppose frac
is the proportion of ones:
In [50]: frac = 0.15
In [51]: sample = np.random.binomial(1, frac, size=10000)
In [52]: sample.sum()
Out[52]: 1567
A simple way to do this would be to first generate an ndarray
with the proportion of zeros and ones you want:
>>> import numpy as np
>>> N = 100
>>> K = 30 # K zeros, N-K ones
>>> arr = np.array([0] * K + [1] * (N-K))
>>> arr
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1])
Then you can just shuffle
the array, making the distribution random:
>>> np.random.shuffle(arr)
>>> arr
array([1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0,
1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1,
1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1,
1, 1, 1, 0, 1, 1, 1, 1])
Note that this approach will give you the exact proportion of zeros/ones you request, unlike say the binomial approach. If you don't need the exact proportion, then the binomial approach will work just fine.
Simple one-liner: you can avoid using lists of integers and probability distributions, which are unintuitive and overkill for this problem in my opinion, by simply working with bool
s first and then casting to int
if necessary (though leaving it as a bool
array should work in most cases).
>>> import numpy as np
>>> np.random.random(9) < 1/3.
array([False, True, True, True, True, False, False, False, False])
>>> (np.random.random(9) < 1/3.).astype(int)
array([0, 0, 0, 0, 0, 1, 0, 0, 1])