Why don't parallel jobs print in RStudio?

问题

Why do scripts parallelized with mclapply print on a cluster but not in RStudio? Just asking out of curiosity.

mclapply(1:10, function(x) {
  print("Hello!")
  return(TRUE)
}, mc.cores = 2)
# Hello prints in slurm but not RStudio

回答1:

None of the functions in the 'parallel' package guarantee proper displaying of output sent to the standard output (stdout) or the standard error (stderr) on workers. This is true for all types of parallelization approaches, e.g. forked processing (mclapply()), or PSOCK clusters (parLapply()). The reason for this is because it was never designed to relay output in a consistent manner.

A good test is to see if you can capture the output via capture.output(). For example, I get:

bfr <- utils::capture.output({
  y <- lapply(1:3, FUN = print)
})
print(bfr)
## [1] "[1] 1" "[1] 2" "[1] 3"

as expected but when I try:

bfr <- utils::capture.output({
  y <- parallel::mclapply(1:3, FUN = print)
})
print(bfr)
## character(0)

there's no output captured. Interestingly though, if I call it without capturing output in R 4.0.1 on Linux in the terminal, I get:

y <- parallel::mclapply(1:3, FUN = print)
[1] 1
[1] 3
[1] 2

Interesting, eh?

Another suggestion that you might get when using local PSOCK clusters, is to set argument outfile = "" when creating the cluster. Indeed, when you try this on Linux in the terminal, it certainly looks like it works:

cl <- parallel::makeCluster(2L, outfile = "")
## starting worker pid=25259 on localhost:11167 at 17:50:03.974
## starting worker pid=25258 on localhost:11167 at 17:50:03.974

y <- parallel::parLapply(cl, 1:3, fun = print)
## [1] 1
## [1] 2
## [1] 3

But also this gives false hopes. It turns out that the output you're seeing is only because the terminal happens to display it. This might or might not work in the RStudio Console. You might see different behavior on Linux, macOS, and MS Windows. The most important part of the understanding is that your R session does not see this output at all. If we try to capture it, we get:

bfr <- utils::capture.output({
  y <- parallel::parLapply(cl, 1:3, fun = print)
})
## [1] 1
## [1] 2
## [1] 3
print(bfr)
## character(0)

Interesting, eh? But actually not surprising if you understand the inner details on the 'parallel' package.

(Disclaimer: I'm the author) The only parallel framework that I'm aware that properly relays standard output (e.g. cat(), print(), ...) and message conditions (e.g. message()) to the main R session is the future framework. You can read about the details in its 'Text and Message Output' vignette but here's an example showing that it works:

future::plan("multicore", workers = 2) ## forked processing

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

It works the same regardless of underlying parallelization framework, e.g. with local PSOCK workers:

future::plan("multisession", workers = 2) ## PSOCK cluster

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

This works the same on all operating systems and environments where you run R, including the RStudio Console. It also behaves the same regardless of which future map-reduce framework you use, e.g. (here) future.apply, furrr, and foreach with doFuture.

来源：https://stackoverflow.com/questions/62308162/why-dont-parallel-jobs-print-in-rstudio

标签

parallel-processing

rstudio

mclapply