How to set .libPaths (checkpoint) on workers when running parallel computation in R

前端 未结 1 1722
悲&欢浪女
悲&欢浪女 2020-12-20 05:30

I use the checkpoint package for reproducible data analysis. Some of the computations take a long time to compute, so I want to run those in parallel. When run in parallel h

相关标签:
1条回答
  • 2020-12-20 06:19

    Author of the future package here.

    Passing the the library path of the master R process as a global variable libs and set it for each worker using .libPaths(libs) should be enough;

    ## Use CRAN checkpoint from 2018-07-24 to get future (>= 1.9.0) [1],
    ## otherwise the below stdout won't be relayed back to the master
    ## R process, but settings .libPaths() does also work in older
    ## versions of the future package.
    ## [1] https://cran.microsoft.com/snapshot/2018-07-24/web/packages/future
    checkpoint::checkpoint("2018-07-24")
    stopifnot(packageVersion("future") >= "1.9.0")
    
    libs <- .libPaths()
    print(libs)
    ### [1] "/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1"
    ### [2] "/home/hb/.checkpoint/R-3.5.1"                                 
    ### [3] "/usr/lib/R/library"
    
    library(foreach)
    
    doFuture::registerDoFuture()
    future::plan("multisession")
    
    res <- foreach::foreach(x = unique(iris$Species)) %dopar% {
      ## Use the same library paths as the master R session
      .libPaths(libs)
    
      cat(sprintf("Library paths used by worker (PID %d):\n", Sys.getpid()))
      cat(sprintf(" - %s\n", sQuote(.libPaths())))
    
      stringr::str_c(x, "_")
    }
    
    ###  - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
    ###   - ‘/home/hb/.checkpoint/R-3.5.1’
    ###   - ‘/usr/lib/R/library’
    ### Library paths used by worker (PID 9394):
    ###  - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
    ###   - ‘/home/hb/.checkpoint/R-3.5.1’
    ###   - ‘/usr/lib/R/library’
    ### Library paths used by worker (PID 9412):
    ###  - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
    ###   - ‘/home/hb/.checkpoint/R-3.5.1’
    ###   - ‘/usr/lib/R/library’
    
    str(res)
    ### List of 3
    ###  $ : chr "setosa_"
    ###  $ : chr "versicolor_"
    ###  $ : chr "virginica_"
    

    FYI, it is on future's roadmap to make it easier to pass down the library path(s) to workers.

    My details:

    > sessionInfo()
    R version 3.5.1 (2018-07-02)   
    Platform: x86_64-pc-linux-gnu (64-bit)   
    Running under: Ubuntu 18.04.1 LTS   
    
    Matrix products: default   
    BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1   
    LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1   
    
    locale:   
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
     [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                     LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C          
    
    attached base packages:   
    [1] stats     graphics  grDevices utils     datasets  methods   base        
    
    other attached packages:   
    [1] foreach_1.4.4   
    
    loaded via a namespace (and not attached):   
    [1] drat_0.1.4         compiler_3.5.1     BiocManager_1.30.2 parallel_3.5.1        tools_3.5.1        listenv_0.7.0      doFuture_0.6.0    
    [8] codetools_0.2-15   iterators_1.0.10   digest_0.6.15      globals_0.12.1        checkpoint_0.4.5   future_1.9.0 
    
    0 讨论(0)
提交回复
热议问题