问题
Why this matters
For drake, I want users to be able to execute mclapply()
calls within a locked global environment. The environment is locked for the sake of reproducibility. Without locking, data analysis pipelines could invalidate themselves.
Evidence that mclapply()
adds or removes global bindings
set.seed(0)
a <- 1
# Works as expected.
rnorm(1)
#> [1] 1.262954
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2)
# No new bindings allowed.
lockEnvironment(globalenv())
# With a locked environment
a <- 2 # Existing bindings are not locked.
b <- 2 # As expected, we cannot create new bindings.
#> Error in eval(expr, envir, enclos): cannot add bindings to a locked environment
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2) # Unexpected error.
#> Warning in parallel::mclapply(1:2, identity, mc.cores = 2): all scheduled
#> cores encountered errors in user code
Created on 2019-01-16 by the reprex package (v0.2.1)
EDIT
For the original motivating problem, see https://github.com/ropensci/drake/issues/675 and https://ropenscilabs.github.io/drake-manual/hpc.html#parallel-computing-within-targets.
回答1:
You can remove the .Random.seed
yourself before you lock the environment. Also you need to load the library (or use the function before) and assign tmp
to something.
library(parallel)
tmp <- NULL
rm(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
lockEnvironment(globalenv())
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2)
Of course this will not allow functions that need .Random.seed
like rnorm
to work.
A workaround is to to change the RNG kind to "L'Ecuyer-CMRG", see also here ?nextRNGStream
:
library(parallel)
tmp <- NULL
RNGkind("L'Ecuyer-CMRG")
lockEnvironment(globalenv())
tmp <- parallel::mclapply(1:2, rnorm, mc.cores = 2)
EDIT
I thought of another solution to your problem and I think this will work with any RNG (did not test much). You can override the function that removes .Random.seed
with one that just sets it to NULL
library(parallel)
mc.set.stream <- function () {
if (RNGkind()[1L] == "L'Ecuyer-CMRG") {
assign(".Random.seed", get("LEcuyer.seed", envir = RNGenv),
envir = .GlobalEnv)
} else {
if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) {
assign(".Random.seed", NULL, envir = .GlobalEnv)
}
}
}
assignInNamespace("mc.set.stream", mc.set.stream, asNamespace("parallel"))
tmp <- NULL
set.seed(0)
lockEnvironment(globalenv())
tmp <- parallel::mclapply(1:2, rnorm, mc.cores = 2)
One final thought: you can create a new environment containing all things you don't want to be changed, lock it and work in there.
回答2:
I think parallel:::mc.set.stream()
has the answer. Apparently, mclapply()
tries to remove .Random.seed
from the global environment by default. Since the default RNG algorithm is Mersenne Twister, we dive into the else
block below.
> parallel:::mc.set.stream
function ()
{
if (RNGkind()[1L] == "L'Ecuyer-CMRG") {
assign(".Random.seed", get("LEcuyer.seed", envir = RNGenv),
envir = .GlobalEnv)
}
else {
if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
rm(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
}
}
<bytecode: 0x4709808>
<environment: namespace:parallel>
We can use mc.set.seed = FALSE
to make the following code work, but this is probably not a good idea in practice.
set.seed(0)
lockEnvironment(globalenv())
parallel::mclapply(1:2, identity, mc.cores = 2, mc.set.seed = FALSE)
I wonder if there is a way to lock the environment while still allowing us to remove .Random.seed
.
来源:https://stackoverflow.com/questions/54229295/parallelmclapply-adds-or-removes-bindings-to-the-global-environment-which-o