问题
Setting a seed ensures reproducibility and is important in simulation modelling. Consider a simple model f()
with two variables y1
and y2
of interest. The outputs of these variables are determined by a random process (rbinom()
) and the parameters x1
and x2
. The outputs of the two variables of interest are independent of each other.
Now say we want to compare the change in the output of a variable after a change in the respective parameter has occurred with a scenario before the change was made (i.e. sensitivity analysis). If all other parameters have not been changed and the same seed was set, shouldn't the output of the unaffected variable remain the same as it is in the default simulation since this variable is independent of the other?
In short, why is the below output of variable y2
determined by parameter x2
changing after only a change in x1
occurs despite constant seed being set? One could just ignore the output of y2
and focus only on y1
, but in a larger simulation where each variable is a cost component of the total cost the change in an unaffected variable may become problematic when testing the overall sensitivity of a model after individual parameter changes have been made.
#~ parameters and model
x1 <- 0.0
x2 <- 0.5
n <- 10
ts <- 5
f <- function(){
out <- data.frame(step = rep(0, n),
space = 1:n,
id = 1:n,
y1 = rep(1, n),
y2 = rep(0, n))
l.out <- vector(mode = "list", length = n)
for(i in 1:ts){
out$step <- i
out$y1[out$y1 == 0] <- 1
out$id[out$y2 == 1] <- seq_along(which(out$y2 == 1)) + n
out$y2[out$y2 == 1] <- 0
out$y1 <- rbinom(nrow(out), 1, 1-x1)
out$y2 <- rbinom(nrow(out), 1, x2)
n <- max(out$id)
l.out[[i]] <- out
}
do.call(rbind, l.out)
}
#~ Simulation 1 (default)
set.seed(1)
run1 <- f()
set.seed(1)
run2 <- f()
run1 == run2 #~ all observations true as expected
#~ Simulation 2
#~ change in x1 parameter affecting only variable y1
x1 <- 0.25
set.seed(1)
run3 <- f()
set.seed(1)
run4 <- f()
run3 == run4 #~ all observations true as expected
#~ compare variables after change in x1 has occured
run1$y1 == run3$y1 #~ observations differ as expected
run1$y2 == run3$y2 #~ observations differ - why?
回答1:
Great question. The reason for this behaviour is that when you set p = 0
or p = 1
in rbinom
, the underlying C function realises it doesn't need to sample using the random number generator. The seed only changes when the random number generator is called, so if p
is any number strictly between 0 and 1, the seed will change, but if p
is 0 or 1 it won't. You can see this is the source code.
Under normal circumstances when p is more than zero or less than one, your set-up should work fine:
set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)
set.seed(1)
x2 <- rbinom(5, 1, 0.1)
y2 <- rbinom(5, 1, 0.5)
all(y1 == y2)
#> [1] TRUE
But if you set p to 1 or 0, the results will be different:
set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)
set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)
all(y1 == y2)
#> [1] FALSE
To show this is correct, we should get y1 == y2
if we set p
to 1 the first time and p
to 0 the second time:
set.seed(1)
x1 <- rbinom(5, 1, 0)
y1 <- rbinom(5, 1, 0.5)
set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)
all(y1 == y2)
#> [1] TRUE
来源:https://stackoverflow.com/questions/64928795/seed-setting-why-is-the-output-different-after-no-change-in-input