问题
Simple test code:
pop = numpy.arange(20)
rng = numpy.random.default_rng(1)
rng.choice(pop,p=numpy.repeat(1/len(pop),len(pop))) # yields 10
rng = numpy.random.default_rng(1)
rng.choice(pop) # yields 9
The numpy documentation says:
The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.
I don't know of any other way to create a uniform distribution, but numpy.repeat(1/len(pop),len(pop))
.
Is numpy using something else? Why?
If not, how does setting the distribution affects the seed?
Shouldn't the distribution and the seed be independent?
What am I missing here?
回答1:
A more idiomatic way of creating a uniform distribution with numpy would be:
numpy.random.uniform(low=0.0, high=1.0, size=None)
or in your case numpy.random.uniform(low=0.0, high=20.0, size=1)
Alternatively, you could simply do
rng = numpy.random.default_rng(1)
rng.uniform()*20
As for your question on why the two methods of calling the rnd.choice result in different outputs, my guess would be that they are executed slightly differently by the interpreter and thus, although you start at the same random initialization, by the time the random variable call is executed, you are at a different random elements in the two calls and get different results.
回答2:
The distribution doesn't affect the seed. Details as bellow:
I checked out the source code: numpy/random/_generator.pyx#L669
If p
is given, it will use rng.random
to get a random value:
import numpy
pop = numpy.arange(20)
seed = 1
rng = numpy.random.default_rng(seed)
# rng.choice works like bellow
rand = rng.random()
p = numpy.repeat(1/len(pop),len(pop))
cdf = p.cumsum()
cdf /= cdf[-1]
uniform_samples = rand
idx = cdf.searchsorted(uniform_samples, side='right')
idx = numpy.array(idx, copy=False, dtype=numpy.int64) # yields 10
print(idx)
# -----------------------
rng = numpy.random.default_rng(seed)
idx = rng.choice(pop,p=numpy.repeat(1/len(pop),len(pop))) # same as above
print(idx)
If p
is not given, it will use rng.integers
to get a random value:
rng = numpy.random.default_rng(seed)
idx = rng.integers(0, pop.shape[0]) # yields 9
print(idx)
# -----------------------
rng = numpy.random.default_rng(seed)
idx = rng.choice(pop) # same as above
print(idx)
You can play around using different seed
value. I don't know what happens in rng.random
and rng.integers
, but you could see that they behave differently. That's why you got different results.
来源:https://stackoverflow.com/questions/62536092/why-does-numpy-random-generator-choice-provides-different-results-seeded-with