Wrapper to FOR loops with progress bar

前端 未结 8 683
走了就别回头了
走了就别回头了 2021-01-31 19:57

I like to use a progress bar while running slow for loops. This could be done easily with several helpers, but I do like the tkProgressBar from tcl

8条回答
  •  粉色の甜心
    2021-01-31 20:31

    I propose hereby two solutions that use the standard for syntax, both are using the great package progress from Gábor Csárdi and Rich FitzJohn

    • 1) we can override temporarily or locally the for function to wrap around base::for and support progress bars.
    • 2) we can define the unused for<-, and wrap around base::for using syntax pb -> for(it in seq) {exp} where pb is progress bar built with progress::progress_bar$new().

    Both solutions behave as standard for calls :

    • The values changed at the previous iteration are available
    • on error the modified variables will have the value they had just before the error

    I packaged my solution and will demo them below then will go through the code


    Usage

    #devtools::install_github("moodymudskipper/pbfor")
    library(pbfor)
    

    Using pb_for()

    By default pb_for() will override the for function for one run only.

    pb_for()
    for (i in 1:10) {
      # DO SOMETHING
      Sys.sleep(0.5)
    }
    

    Using parameters from progress::progress_bar$new() :

    pb_for(format = "Working hard: [:bar] :percent :elapsed", 
           callback = function(x) message("Were'd done!"))
    for (i in 1:10) {
      # DO SOMETHING
      Sys.sleep(0.5)
    }
    

    Using for<-

    The only restriction compared to a standard for call is that the first argument must exist and can't be NULL.

    i <- NA 
    progress_bar$new() -> for (i in 1:10) {
      # DO SOMETHING
      Sys.sleep(0.5)
    }
    

    We can define a custom progress bar, and maybe define it conveniently in an initialisation script or in one's R profile.

    pb <- progress_bar$new(format = "Working hard: [:bar] :percent :elapsed", 
           callback = function(x) ("Were'd done!"))
    pb  -> for (i in 1:10) {
      # DO SOMETHING
      Sys.sleep(0.5)
    }
    

    For nested progress bars we can use the following trick :

    pbi <- progress_bar$new(format = "i: [:bar] :percent\n\n")
    pbj <- progress_bar$new(format = "j: [:bar] :percent  ")
    i <- NA
    j <- NA
    pbi  -> for (i in 1:10) {
      pbj  -> for (j in 1:10) {
        # DO SOMETHING
        Sys.sleep(0.1)
      }
    }
    

    note that due to operator precedence the only way to call for<- and benefit from the syntax of for calls is to use the left to right arrow ´->´.


    how they work

    pb_for()

    pb_for() creates a for function object in its parent environment, then the new for :

    • sets up a progress bar
    • modifies the loop content
    • adds a `*pb*`$tick() at the end of the loop content expression
    • feeds it back to base::`for` in a clean environment
    • assigns on exit all modified or created variables to the parent environment.
    • removes itself if once is TRUE (the default)

    It's generally sensitive to override an operator, but it cleans after itself and won't affect the global environment if used in a function so I think it's safe enough to use.

    for<-

    This approach :

    • doesn't override for
    • allows the use of progress bar templates
    • has an arguably more intuitive api

    It has a few drawbacks however:

    • its first argument must exist, which is the case for all assignment functions (fun<-).
    • it does some memory magic to find the name of its first argument as it's not easily done with assignment functions, this might have a performance cost, and I'm not 100% sure about the robustness
    • we need the package pryr

    What it does :

    • find the name of the first argument, using a helper function
    • clone the progress bar input
    • edit it to account for the number of iterations of the loop (the length of the second argument of for<-

    After this it's similar to what is described for pb_for() in the section above.


    The code

    pb_for()

    pb_for <-
      function(
        # all args of progress::progress_bar$new() except `total` which needs to be
        # infered from the 2nd argument of the `for` call, and `stream` which is
        # deprecated
        format = "[:bar] :percent",
        width = options("width")[[1]] - 2,
        complete = "=",
        incomplete = "-",
        current =">",
        callback = invisible, # doc doesn't give default but this seems to work ok
        clear = TRUE,
        show_after = .2,
        force = FALSE,
        # The only arg not forwarded to progress::progress_bar$new()
        # By default `for` will self detruct after being called
        once = TRUE) {
    
        # create the function that will replace `for`
        f <- function(it, seq, expr){
          # to avoid notes at CMD check
          `*pb*` <- IT <- SEQ <- EXPR <- NULL
    
          # forward all arguments to progress::progress_bar$new() and add
          # a `total` argument computed from `seq` argument
          pb <- progress::progress_bar$new(
            format = format, width = width, complete = complete,
            incomplete = incomplete, current = current,
            callback = callback,
            clear = clear, show_after = show_after, force = force,
            total = length(seq))
    
          # using on.exit allows us to self destruct `for` if relevant even if
          # the call fails.
          # It also allows us to send to the local environment the changed/created
          # variables in their last state, even if the call fails (like standard for)
          on.exit({
            vars <- setdiff(ls(env), c("*pb*"))
            list2env(mget(vars,envir = env), envir = parent.frame())
            if(once) rm(`for`,envir = parent.frame())
          })
    
          # we build a regular `for` loop call with an updated loop code including
          # progress bar.
          # it is executed in a dedicated environment and the progress bar is given
          # a name unlikely to conflict
          env <- new.env(parent = parent.frame())
          env$`*pb*` <-  pb
          eval(substitute(
            env = list(IT = substitute(it), SEQ = substitute(seq), EXPR = substitute(expr)),
            base::`for`(IT, SEQ,{
              EXPR
              `*pb*`$tick()
            })), envir = env)
        }
        # override `for` in the parent frame
        assign("for", value = f,envir = parent.frame())
      }
    

    for<- (and fetch_name())

    `for<-` <-
      function(it, seq, expr, value){
        # to avoid notes at CMD check
        `*pb*` <- IT <- SEQ <- EXPR <- NULL
        # the symbol fed to `it` is unknown, R uses `*tmp*` for assignment functions
        # so we go get it by inspecting the memory addresses
        it_chr <- fetch_name(it)
        it_sym <-as.symbol(it_chr)
    
        #  complete the progress bar with the `total` parameter
        # we need to clone it because progress bars are environments and updated
        # by reference
        pb <- value$clone()
        pb$.__enclos_env__$private$total <- length(seq)
    
        # when the script ends, even with a bug, the values that have been changed
        # are written to the parent frame
        on.exit({
          vars <- setdiff(ls(env), c("*pb*"))
          list2env(mget(vars, env),envir = parent.frame())
        })
    
        # computations are operated in a separate environment so we don't pollute it
        # with it, seq, expr, value, we need the progress bar so we name it `*pb*`
        # unlikely to conflict by accident
        env <- new.env(parent = parent.frame())
        env$`*pb*` <-  pb
        eval(substitute(
          env =  list(IT = it_sym, SEQ = substitute(seq), EXPR = substitute(expr)),
          base::`for`(IT, SEQ,{
            EXPR
            `*pb*`$tick()
          })), envir = env)
    
        # because of the `fun<-` syntax we need to return the modified first argument
        invisible(get(it_chr,envir = env))
      }
    

    helpers:

    fetch_name <- function(x,env = parent.frame(2)) {
      all_addresses       <- sapply(ls(env), address2, env)
      all_addresses       <- all_addresses[names(all_addresses) != "*tmp*"]
      all_addresses_short <- gsub("(^|<)[0x]*(.*?)(>|$)","\\2",all_addresses)
    
      x_address       <- tracemem(x)
      untracemem(x)
      x_address_short <- tolower(gsub("(^|<)[0x]*(.*?)(>|$)","\\2",x_address))
    
      ind    <- match(x_address_short, all_addresses_short)
      x_name <- names(all_addresses)[ind]
      x_name
    }
    
    address2 <- getFromNamespace("address2", "pryr")
    

提交回复
热议问题