Parallelization in R: how to “source” on every node?

后端 未结 2 1505
花落未央
花落未央 2021-02-09 21:51

I have created parallel workers (all running on the same machine) using:

MyCluster = makeCluster(8)

How can I make every of these 8 nodes sourc

相关标签:
2条回答
  • 2021-02-09 22:37

    The following code serves your purpose:

    library(parallel)
    
    cl <- makeCluster(4)
    clusterCall(cl, function() { source("test.R") })
    
    ## do some parallel work
    
    stopCluster(cl)
    

    Also you can use clusterEvalQ() to do the same thing:

    library(parallel)
    
    cl <- makeCluster(4)
    clusterEvalQ(cl, source("test.R"))
    
    ## do some parallel work
    
    stopCluster(cl)
    

    However, there is subtle difference between the two methods. clusterCall() runs a function on each node while clusterEvalQ() evaluates an expression on each node. If you have a variable list of files to source, clusterCall() will be easier to use since clusterEvalQ(cl,expr) will regard any expr as an expression so it's not convenient to put a variable there.

    0 讨论(0)
  • 2021-02-09 22:51

    If you use a command to source a local file, ensure the file is there.

    Else place the file on a network share or NFS, and source the absolute path.

    Better still, and standard answers, write a package and have that package installed on each node and then just call library() or require().

    0 讨论(0)
提交回复
热议问题