scipy.stats seed?

后端 未结 4 963
滥情空心
滥情空心 2021-02-06 22:51

I am trying to generate scipy.stats.pareto.rvs(b, loc=0, scale=1, size=1) with different seed.

In numpy we can seed using numpy.random.seed(seed=233423).

Is ther

相关标签:
4条回答
  • 2021-02-06 23:31

    For those who happen upon this post four years later, Scipy DOES provide a way to pass a np.random.RandomState object to its random variable classes, see rv_continuous and rv_discrete for more details. The scipy documentation says this:

    seed : None or int or numpy.random.RandomState instance, optional

    This parameter defines the RandomState object to use for drawing random variates. If None (or np.random), the global np.random state is used. If integer, it is used to seed the local RandomState instance. Default is None.

    Unfortunately, it appears this argument is not available after the continuous/discrete rvs subclass rv_continuous or rv_discrete. However, the random_state property does belong to the sublass, meaning we can set the seed using an instance of np.random.RandomState after instantiation like so:

    import numpy as np
    import scipy.stats as stats
    
    alpha_rv = stats.alpha(3.57)
    alpha_rv.random_state = np.random.RandomState(seed=342423)
    
    0 讨论(0)
  • 2021-02-06 23:37

    For those who are stumbling upon this question 7 years later, there has been a major change in numpy Random State generator function. As per the documentation here and here, the RandomState class is replaced with the Generator class. RandomState is guaranteed to be compatible with older versions/codes however it will not receive any substantial changes, including algorithmic improvements, which are reserved for Generator.

    For clarifications on how to pass an existing Numpy based random stream to Scipy functions in the same experiment, given below are some examples and reasonings for which cases are desirable and why.

    from numpy.random import Generator, PCG64
    from scipy.stats import binom
    
    n, p, size, seed = 10, 0.5, 10, 12345
    
    # Case 1 : Scipy uses some default Random Generator
    numpy_randomGen = Generator(PCG64(seed))
    scipy_randomGen = binom
    print(scipy_randomGen.rvs(n, p, size))
    print(numpy_randomGen.binomial(n, p, size))
    # prints
    # [6 6 5 4 6 6 8 6 6 4]
    # [4 4 6 6 5 4 5 4 6 7]
    # NOT DESIRABLE as we don't have control over the seed of Scipy random number generation
    
    
    # Case 2 : Scipy uses same seed and Random generator (new object though)
    scipy_randomGen.random_state=Generator(PCG64(seed))
    numpy_randomGen = Generator(PCG64(seed))
    print(scipy_randomGen.rvs(n, p, size))
    print(numpy_randomGen.binomial(n, p, size))
    # prints
    # [4 4 6 6 5 4 5 4 6 7]
    # [4 4 6 6 5 4 5 4 6 7]
        # This experiment is using same sequence of random numbers, one is being used by Scipy
    # and other by Numpy. NOT DESIRABLE as we don't want repetition of some random 
    # stream in same experiment.
    
    
    # Case 3 (IMP) : Scipy uses an existing Random Generator which can being passed to Scipy based 
    # random generator object
    numpy_randomGen = Generator(PCG64(seed))
    scipy_randomGen.random_state=numpy_randomGen
    print(scipy_randomGen.rvs(n, p, size))
    print(numpy_randomGen.binomial(n, p, size))
    # prints
    # [4 4 6 6 5 4 5 4 6 7]
    # [4 8 6 3 5 7 6 4 6 4]
    # This should be the case which we mostly want (DESIRABLE). If we are using both Numpy based and 
    #Scipy based random number generators/function, then not only do we have no repetition of 
    #random number sequences but also have reproducibility of results in this case.
    
    0 讨论(0)
  • 2021-02-06 23:41

    Adding to the answer of user5915738, which I think is the best answer in general, I'd like to point out the imho most convenient way to seed the random generator of a scipy.stats distribution.

    You can set the seed while generating the distribution with the rvs method, either by defining the seed as an integer, which is used to seed np.random.RandomState internally:

    uni_int_seed = scipy.stats.uniform(-.1, 1.).rvs(10, random_state=12)
    

    or by directly defining the np.random.RandomState:

    uni_state_seed = scipy.stats.uniform(-.1, 1.).rvs(
        10, random_state=np.random.RandomState(seed=12))
    

    Both methods are equivalent:

    np.all(uni_int_seed == uni_state_seed)
    # Out: True
    

    The advantage of this method over assigning it to the random_state of rv_continuous or rv_discrete is, that you always have explicit control over the random state of your rvs, whereas with my_dist.random_state = np.random.RandomState(seed=342423) the seed is lost after each call to rvs, possibly resulting in non-reproducible results when losing track of distributions.
    Also according to the The Zen of Python:

    1. Explicit is better than implicit.

    :)

    0 讨论(0)
  • 2021-02-06 23:56

    scipy.stats just uses numpy.random to generate its random numbers, so numpy.random.seed() will work here as well. E.g.,

    import numpy as np
    from scipy.stats import pareto
    b = 0.9
    np.random.seed(seed=233423)
    print pareto.rvs(b, loc=0, scale=1, size=5)
    np.random.seed(seed=233423)
    print pareto.rvs(b, loc=0, scale=1, size=5)
    

    will print [ 9.7758784 10.78405752 4.19704602 1.19256849 1.02750628] twice.

    0 讨论(0)
提交回复
热议问题