Why does numpy.random.Generator.choice provides different results (seeded) with given uniform distribution compared to default uniform distribution?

问题

Simple test code:

pop = numpy.arange(20)
rng = numpy.random.default_rng(1)
rng.choice(pop,p=numpy.repeat(1/len(pop),len(pop))) # yields 10
rng = numpy.random.default_rng(1)
rng.choice(pop) # yields 9

The numpy documentation says:

The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

I don't know of any other way to create a uniform distribution, but numpy.repeat(1/len(pop),len(pop)).

Is numpy using something else? Why?

If not, how does setting the distribution affects the seed?

Shouldn't the distribution and the seed be independent?

What am I missing here?

回答1:

A more idiomatic way of creating a uniform distribution with numpy would be:

numpy.random.uniform(low=0.0, high=1.0, size=None)

or in your case numpy.random.uniform(low=0.0, high=20.0, size=1)

Alternatively, you could simply do

rng = numpy.random.default_rng(1)
rng.uniform()*20

As for your question on why the two methods of calling the rnd.choice result in different outputs, my guess would be that they are executed slightly differently by the interpreter and thus, although you start at the same random initialization, by the time the random variable call is executed, you are at a different random elements in the two calls and get different results.

回答2:

The distribution doesn't affect the seed. Details as bellow:

I checked out the source code: numpy/random/_generator.pyx#L669

If p is given, it will use rng.random to get a random value:

import numpy

pop = numpy.arange(20)
seed = 1
rng = numpy.random.default_rng(seed)

# rng.choice works like bellow
rand = rng.random()
p = numpy.repeat(1/len(pop),len(pop))
cdf = p.cumsum()
cdf /= cdf[-1]
uniform_samples = rand
idx = cdf.searchsorted(uniform_samples, side='right')
idx = numpy.array(idx, copy=False, dtype=numpy.int64) # yields 10
print(idx)

# -----------------------
rng = numpy.random.default_rng(seed)
idx = rng.choice(pop,p=numpy.repeat(1/len(pop),len(pop))) # same as above
print(idx)

If p is not given, it will use rng.integers to get a random value:

rng = numpy.random.default_rng(seed)
idx = rng.integers(0, pop.shape[0]) # yields 9
print(idx)
# -----------------------
rng = numpy.random.default_rng(seed)
idx = rng.choice(pop) # same as above
print(idx)

You can play around using different seed value. I don't know what happens in rng.random and rng.integers, but you could see that they behave differently. That's why you got different results.

来源：https://stackoverflow.com/questions/62536092/why-does-numpy-random-generator-choice-provides-different-results-seeded-with

标签

python

numpy

random

uniform-distribution

numpy-random