How do you create a progress bar when using the “foreach()” function in R?

前端 未结 7 1250
青春惊慌失措
青春惊慌失措 2020-11-28 03:41

there are some informative posts on how to create a counter for loops in an R program. However, how do you create a similar function when using the parallelized version wit

相关标签:
7条回答
  • 2020-11-28 04:04

    This is now possible with the parallel package. Tested with R 3.2.3 on OSX 10.11, running inside RStudio, using a "PSOCK"-type cluster.

    library(doParallel)
    
    # default cluster type on my machine is "PSOCK", YMMV with other types
    cl <- parallel::makeCluster(4, outfile = "")
    registerDoParallel(cl)
    
    n <- 10000
    pb <- txtProgressBar(0, n, style = 2)
    
    invisible(foreach(i = icount(n)) %dopar% {
        setTxtProgressBar(pb, i)
    })
    
    stopCluster(cl)
    

    Strangely, it only displays correctly with style = 3.

    0 讨论(0)
  • 2020-11-28 04:06

    This code is a modified version of the doRedis example, and will make a progress bar even when using %dopar% with a parallel backend:

    #Load Libraries
    library(foreach)
    library(utils)
    library(iterators)
    library(doParallel)
    library(snow)
    
    #Choose number of iterations
    n <- 1000
    
    #Progress combine function
    f <- function(){
      pb <- txtProgressBar(min=1, max=n-1,style=3)
      count <- 0
      function(...) {
        count <<- count + length(list(...)) - 1
        setTxtProgressBar(pb,count)
        Sys.sleep(0.01)
        flush.console()
        c(...)
      }
    }
    
    #Start a cluster
    cl <- makeCluster(4, type='SOCK')
    registerDoParallel(cl)
    
    # Run the loop in parallel
    k <- foreach(i = icount(n), .final=sum, .combine=f()) %dopar% {
      log2(i)
    }
    
    head(k)
    
    #Stop the cluster
    stopCluster(cl)
    

    You have to know the number of iterations and the combination function ahead of time.

    0 讨论(0)
  • 2020-11-28 04:06

    The following code will produce a nice progress bar in R for the foreach control structure. It will also work with graphical progress bars by replacing txtProgressBar with the desired progress bar object.

    # Gives us the foreach control structure.
    library(foreach)
    # Gives us the progress bar object.
    library(utils)
    # Some number of iterations to process.
    n <- 10000
    # Create the progress bar.
    pb <- txtProgressBar(min = 1, max = n, style=3)
    # The foreach loop we are monitoring. This foreach loop will log2 all 
    # the values from 1 to n and then sum the result. 
    k <- foreach(i = icount(n), .final=sum, .combine=c) %do% {
        setTxtProgressBar(pb, i)
        log2(i)
    }
    # Close the progress bar.
    close(pb)
    

    While the code above answers your question in its most basic form a better and much harder question to answer is whether you can create an R progress bar which monitors the progress of a foreach statement when it is parallelized with %dopar%. Unfortunately I don't think it is possible to monitor the progress of a parallelized foreach in this way, but I would love for someone to prove me wrong, as it would be very useful feature.

    0 讨论(0)
  • 2020-11-28 04:12

    You save the start time with Sys.time() before the loop. Loop over rows or columns or something which you know the total of. Then, inside the loop you can calculate the time ran so far (see difftime), percentage complete, speed and estimated time left. Each process can print those progress lines with the message function. You'll get an output something like

    1/1000 complete @ 1 items/s, ETA: 00:00:45
    2/1000 complete @ 1 items/s, ETA: 00:00:44
    

    Obviously the looping order will greatly affect how well this works. Don't know about foreach but with multicore's mclapply you'd get good results using mc.preschedule=FALSE, which means that items are allocated to processes one-by-one in order as previous items complete.

    0 讨论(0)
  • 2020-11-28 04:14

    You can also get this to work with the progress package.

    what it looks like

    # loading parallel and doSNOW package and creating cluster ----------------
    library(parallel)
    library(doSNOW)
    
    numCores<-detectCores()
    cl <- makeCluster(numCores)
    registerDoSNOW(cl)
    
    # progress bar ------------------------------------------------------------
    library(progress)
    
    iterations <- 100                               # used for the foreach loop  
    
    pb <- progress_bar$new(
      format = "letter = :letter [:bar] :elapsed | eta: :eta",
      total = iterations,    # 100 
      width = 60)
    
    progress_letter <- rep(LETTERS[1:10], 10)  # token reported in progress bar
    
    # allowing progress bar to be used in foreach -----------------------------
    progress <- function(n){
      pb$tick(tokens = list(letter = progress_letter[n]))
    } 
    
    opts <- list(progress = progress)
    
    # foreach loop ------------------------------------------------------------
    library(foreach)
    
    foreach(i = 1:iterations, .combine = rbind, .options.snow = opts) %dopar% {
      summary(rnorm(1e6))[3]
    }
    
    stopCluster(cl) 
    
    0 讨论(0)
  • 2020-11-28 04:18

    Edit: After an update to the doSNOW package it has become quite simple to display a nice progress bar when using %dopar% and it works on Linux, Windows and OS X

    doSNOW now officially supports progress bars via the .options.snow argument.

    library(doSNOW)
    cl <- makeCluster(2)
    registerDoSNOW(cl)
    iterations <- 100
    pb <- txtProgressBar(max = iterations, style = 3)
    progress <- function(n) setTxtProgressBar(pb, n)
    opts <- list(progress = progress)
    result <- foreach(i = 1:iterations, .combine = rbind, 
                      .options.snow = opts) %dopar%
    {
        s <- summary(rnorm(1e6))[3]
        return(s)
    }
    close(pb)
    stopCluster(cl) 
    

    Yet another way of tracking progress, if you keep in mind the total number of iterations, is to set .verbose = T as this will print to the console which iterations have been finished.

    Previous solution for Linux and OS X

    On Ubuntu 14.04 (64 bit) and OS X (El Capitan) the progress bar is displayed even when using %dopar% if in the makeCluster function oufile = "" is set. It does not seem to work under Windows. From the help on makeCluster:

    outfile: Where to direct the stdout and stderr connection output from the workers. "" indicates no redirection (which may only be useful for workers on the local machine). Defaults to ‘/dev/null’ (‘nul:’ on Windows).

    Example code:

    library(foreach)
    library(doSNOW)
    cl <- makeCluster(4, outfile="") # number of cores. Notice 'outfile'
    registerDoSNOW(cl)
    iterations <- 100
    pb <- txtProgressBar(min = 1, max = iterations, style = 3)
    result <- foreach(i = 1:iterations, .combine = rbind) %dopar% 
    {
          s <- summary(rnorm(1e6))[3]
          setTxtProgressBar(pb, i) 
          return(s)
    }
    close(pb)
    stopCluster(cl) 
    

    This is what the progress bar looks like. It looks a little odd since a new bar is printed for every progression of the bar and because a worker may lag a bit which causes the progress bar to go back and forth occasionally.

    0 讨论(0)
提交回复
热议问题