R foreach issue (some processes returning NULL)

后端 未结 1 1311
轮回少年
轮回少年 2021-02-09 23:00

I am running into a problem with the foreach section of a program I am working with in R. The program is used to run simulations for varying parameters, and then re

1条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-02-09 23:49

    I'm guessing that you're running on Linux, because from your description, it sounds like the child R session is being killed by the Linux "out-of-memory killer". Coincidentally, I recently worked on the same basic problem where mclapply was used directly.

    The doMC package uses the mclapply function to execute the foreach loop in parallel, and unfortunately, mclapply doesn't signal an error when a worker process unexpectedly dies. Instead, mclapply returns a NULL for all tasks allocated to that worker. I don't think there is any option to change this behavior in mclapply.

    The only work-arounds that I can think of are:

    1. Use a foreach backend such as doParallel or doSNOW rather than doMC.
    2. Treat NULL's in the result list as an error and rerun with fewer workers.

    If you use doParallel, make sure that you create and register a cluster object, otherwise mclapply will be used on Linux systems. With doParallel and doSNOW, if a worker dies abnormally, the master will get an error getting the task result from the dead worker:

    Error in unserialize(node$con) : error reading from connection
    

    In this case, the parallel backend will catch the error and use the specified error handling.

    Keep in mind that using doParallel or doSNOW may use more memory than doMC, and so you may have to specify fewer workers with them in order to avoid running out of memory.

    0 讨论(0)
提交回复
热议问题