I am trying to generate scipy.stats.pareto.rvs(b, loc=0, scale=1, size=1) with different seed.
In numpy we can seed using numpy.random.seed(seed=233423).
Is ther
For those who happen upon this post four years later, Scipy DOES provide a way to pass a np.random.RandomState
object to its random variable classes, see rv_continuous and rv_discrete for more details. The scipy documentation says this:
seed : None or int or numpy.random.RandomState instance, optional
This parameter defines the RandomState object to use for drawing random variates. If None (or np.random), the global np.random state is used. If integer, it is used to seed the local RandomState instance. Default is None.
Unfortunately, it appears this argument is not available after the continuous/discrete rvs subclass rv_continuous
or rv_discrete
. However, the random_state
property does belong to the sublass, meaning we can set the seed using an instance of np.random.RandomState
after instantiation like so:
import numpy as np
import scipy.stats as stats
alpha_rv = stats.alpha(3.57)
alpha_rv.random_state = np.random.RandomState(seed=342423)
For those who are stumbling upon this question 7 years later, there has been a major change in numpy Random State generator function. As per the documentation here and here, the RandomState
class is replaced with the Generator
class. RandomState
is guaranteed to be compatible with older versions/codes however it will not receive any substantial changes, including algorithmic improvements, which are reserved for Generator
.
For clarifications on how to pass an existing Numpy based random stream to Scipy functions in the same experiment, given below are some examples and reasonings for which cases are desirable and why.
from numpy.random import Generator, PCG64
from scipy.stats import binom
n, p, size, seed = 10, 0.5, 10, 12345
# Case 1 : Scipy uses some default Random Generator
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen = binom
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [6 6 5 4 6 6 8 6 6 4]
# [4 4 6 6 5 4 5 4 6 7]
# NOT DESIRABLE as we don't have control over the seed of Scipy random number generation
# Case 2 : Scipy uses same seed and Random generator (new object though)
scipy_randomGen.random_state=Generator(PCG64(seed))
numpy_randomGen = Generator(PCG64(seed))
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 4 6 6 5 4 5 4 6 7]
# This experiment is using same sequence of random numbers, one is being used by Scipy
# and other by Numpy. NOT DESIRABLE as we don't want repetition of some random
# stream in same experiment.
# Case 3 (IMP) : Scipy uses an existing Random Generator which can being passed to Scipy based
# random generator object
numpy_randomGen = Generator(PCG64(seed))
scipy_randomGen.random_state=numpy_randomGen
print(scipy_randomGen.rvs(n, p, size))
print(numpy_randomGen.binomial(n, p, size))
# prints
# [4 4 6 6 5 4 5 4 6 7]
# [4 8 6 3 5 7 6 4 6 4]
# This should be the case which we mostly want (DESIRABLE). If we are using both Numpy based and
#Scipy based random number generators/function, then not only do we have no repetition of
#random number sequences but also have reproducibility of results in this case.
Adding to the answer of user5915738, which I think is the best answer in general, I'd like to point out the imho most convenient way to seed the random generator of a scipy.stats
distribution.
You can set the seed while generating the distribution with the rvs
method, either by defining the seed as an integer, which is used to seed np.random.RandomState
internally:
uni_int_seed = scipy.stats.uniform(-.1, 1.).rvs(10, random_state=12)
or by directly defining the np.random.RandomState
:
uni_state_seed = scipy.stats.uniform(-.1, 1.).rvs(
10, random_state=np.random.RandomState(seed=12))
Both methods are equivalent:
np.all(uni_int_seed == uni_state_seed)
# Out: True
The advantage of this method over assigning it to the random_state
of rv_continuous
or rv_discrete
is, that you always have explicit control over the random state of your rvs
, whereas with my_dist.random_state = np.random.RandomState(seed=342423)
the seed is lost after each call to rvs
, possibly resulting in non-reproducible results when losing track of distributions.
Also according to the The Zen of Python:
- Explicit is better than implicit.
:)
scipy.stats
just uses numpy.random
to generate its random numbers, so numpy.random.seed()
will work here as well. E.g.,
import numpy as np
from scipy.stats import pareto
b = 0.9
np.random.seed(seed=233423)
print pareto.rvs(b, loc=0, scale=1, size=5)
np.random.seed(seed=233423)
print pareto.rvs(b, loc=0, scale=1, size=5)
will print [ 9.7758784 10.78405752 4.19704602 1.19256849 1.02750628]
twice.