Erratic seed behavior with rbinom(prob=0.5) in R

前端 未结 2 1321
野性不改
野性不改 2021-01-07 16:12

I have found what I would consider erratic behavior (but for which I hope there is a simple explanation) in R\'s use of seeds in conjunction with rbinom()

相关标签:
2条回答
  • 2021-01-07 16:47

    I'm going to take a contrarian position on this question and claim that the expectations are not appropriate and are not supported by the documentation. The documentation does not make any claim about what side effects (specifically on .Random.seed) can be expected by calling rbinom, or how those side effects may or may not be the same in various cases.

    rbinom has three parameters: n, size, and prob. Your expectation is that, for a random seed set before calling rbinom, .Random.seed will be the same after calling rbinom for a given n and any values of size and prob (or maybe any finite values of size and prob). You certainly realize that it would be different for different values of n. rbinom doesn't guarantee that or imply that.

    Without knowing the internals of the function, this can't be known; as the other answer showed, the algorithm is different based on the product of size and prob. And the internals may change so these specific details may change.

    At least, in this case, the resulting .Random.seed will be the same after every call of rbinom which has the same n, size and prob. I can construct a pathological function for which this is not even true:

    seedtweak <- function() {
      if(floor(as.POSIXlt(Sys.time())$sec * 10) %% 2) {
        runif(1)
      }
      invisible(NULL)
    }
    

    Basically, this function looks a whether the tenths of the second of the time is odd or even to decided whether or not to draw a random number. Run this function and .Random.seed may or may not change:

    rs <- replicate(10, {
      set.seed(123) 
      seedtweak()
      .Random.seed
    })
    all(apply(rs, 1, function(x) Reduce(`==`, x)))
    

    The best you can (should?) hope for is that a given set of code with all the inputs/parameters the same (including the seed) will always give identical results. Expecting identical results when only most (or only some) of the parameters are the same is not realistic unless all the functions called make those guarantees.

    0 讨论(0)
  • 2021-01-07 16:54

    So let's turn our comments into an answer. Thanks to Ben Bolker for putting us on the right track with a link to the code: https://svn.r-project.org/R/trunk/src/nmath/rbinom.c and the suggestion to track down where unif_rand() is called.

    A quick scan and it seems that the code is broken into two sections, delimited by the comments:

    /*-------------------------- np = n*p >= 30 : ------------------- */
    

    and

    /*---------------------- np = n*p < 30 : ------------------------- */
    

    Inside each of these, the number of calls to unif_rand is not the same (two versus one.)

    So for a given size (n), your random seed may end up in a different state depending on the value of prob (p): whether size * prob >= 30 or not.

    With that in mind, all the results you got with your examples should now make sense:

    # these end up in the same state
    rbinom(n=1,size=60,prob=0.4) # => np <  30
    rbinom(n=1,size=60,prob=0.3) # => np <  30
    
    # these don't
    rbinom(n=1,size=60,prob=0.5) # => np >= 30
    rbinom(n=1,size=60,prob=0.3) # => np <  30
    
    # these don't
    {rbinom(n=1,size=60,prob=0.5)  # np >= 30
     rbinom(n=1,size=50,prob=0.3)} # np <  30
    {rbinom(n=1,size=60,prob=0.1)  # np <  30
     rbinom(n=1,size=50,prob=0.3)} # np <  30
    
    # these do
    {rbinom(n=1,size=60,prob=0.3)  # np <  30
     rbinom(n=1,size=50,prob=0.5)} # np <  30
    {rbinom(n=1,size=60,prob=0.1)  # np <  30
     rbinom(n=1,size=50,prob=0.3)} # np <  30
    
    0 讨论(0)
提交回复
热议问题