R: get list and environment of all variables and functions within a given function (for parallel processing)

喜夏-厌秋 提交于 2019-12-24 08:03:11

问题


I am using foreach for parallel processing, which requires manual passing of functions via a list to the environments of addressed cores. I want to automate this process and cover all use cases. Easy for simple functions which use only enclosed variables. Complications however as soon as functions which are to be parallel processed are using arguments and variables that are defined in another environment. Consider the following case:

global.variable <- 3

global.function <-function(j){
  res <- j^2
  return(res)
}

compute.in.parallel <-function(i){
  res <- global.function(i+global.variable)
  return(res)
}

pop <- seq(10)

do <- function(pop,fun){
  require(doParallel)
  require(foreach)
  cl <- makeCluster(16)
  registerDoParallel(cl)
  clusterExport(cl,list("global.variable","global.function"),envir=globalenv())
  results <- foreach(i=pop) %dopar% fun(i)
  stopCluster(cl)
  return(results)
}

do(pop,compute.in.parallel)

this works because I manually pass the global.variable and global.function to the cores as well (note that compute.in.parallel itself is automatically considered within the scope): clusterExport(cl,list("global.variable","global.function"),envir=globalenv())

but I want to do it automatically - requiring to build a string of all variables and functions which are used (but not defined/passed/contained) within compute.in.parallel. How do I do this?

My current workaround is dump all available variables to the cores:

clusterExport(cl,as.list(unique(c(ls(.GlobalEnv),ls(environment())))),envir=environment())

This is however non-satisfactory - I am not considering variables in package namespaces and other hidden environments as well as generally passing way too many variables to the cores, creating significant overhead with every parallel run.

Any suggested improvements?


回答1:


Just pass all arguments that are needed in do(), rather than using global variables.

compute.in.parallel <- function(i, global.variable, global.function) {
  global.function(i + global.variable)
}

do <- function(pop, fun, ncores = parallel::detectCores() - 1, ...) {
  require(foreach)
  cl <- parallel::makeCluster(ncores)
  on.exit(parallel::stopCluster(cl), add = TRUE)
  doParallel::registerDoParallel(cl)
  foreach(i = pop) %dopar% fun(i, ...)
}

do(seq(10), compute.in.parallel, 
   global.variable = 3, 
   global.function = function(j) j^2)



回答2:


The future framework automatically identifies and exports globals by default. The doFuture package provides a generic future backend adaptor for foreach. If you use that, the following works:

do <- function(pop, fun) {
  library("doFuture")
  registerDoFuture()
  cl <- parallel::makeCluster(2)
  old_plan <- plan(cluster, workers = cl)
  on.exit({
    plan(old_plan)
    parallel::stopCluster(cl)
  })

  foreach(i = pop) %dopar% fun(i)
}


来源:https://stackoverflow.com/questions/47197566/r-get-list-and-environment-of-all-variables-and-functions-within-a-given-functi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!