Tricks to manage the available memory in an R session

前端 未结 27 1486
情深已故
情深已故 2020-11-22 01:23

What tricks do people use to manage the available memory of an interactive R session? I use the functions below [based on postings by Petr Pikal and David Hinds to the r-he

相关标签:
27条回答
  • 2020-11-22 01:55

    Saw this on a twitter post and think it's an awesome function by Dirk! Following on from JD Long's answer, I would do this for user friendly reading:

    # improved list of objects
    .ls.objects <- function (pos = 1, pattern, order.by,
                            decreasing=FALSE, head=FALSE, n=5) {
        napply <- function(names, fn) sapply(names, function(x)
                                             fn(get(x, pos = pos)))
        names <- ls(pos = pos, pattern = pattern)
        obj.class <- napply(names, function(x) as.character(class(x))[1])
        obj.mode <- napply(names, mode)
        obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
        obj.prettysize <- napply(names, function(x) {
                               format(utils::object.size(x), units = "auto") })
        obj.size <- napply(names, object.size)
        obj.dim <- t(napply(names, function(x)
                            as.numeric(dim(x))[1:2]))
        vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
        obj.dim[vec, 1] <- napply(names, length)[vec]
        out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
        names(out) <- c("Type", "Size", "PrettySize", "Length/Rows", "Columns")
        if (!missing(order.by))
            out <- out[order(out[[order.by]], decreasing=decreasing), ]
        if (head)
            out <- head(out, n)
        out
    }
    
    # shorthand
    lsos <- function(..., n=10) {
        .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
    }
    
    lsos()
    

    Which results in something like the following:

                          Type   Size PrettySize Length/Rows Columns
    pca.res                 PCA 790128   771.6 Kb          7      NA
    DF               data.frame 271040   264.7 Kb        669      50
    factor.AgeGender   factanal  12888    12.6 Kb         12      NA
    dates            data.frame   9016     8.8 Kb        669       2
    sd.                 numeric   3808     3.7 Kb         51      NA
    napply             function   2256     2.2 Kb         NA      NA
    lsos               function   1944     1.9 Kb         NA      NA
    load               loadings   1768     1.7 Kb         12       2
    ind.sup             integer    448  448 bytes        102      NA
    x                 character     96   96 bytes          1      NA
    

    NOTE: The main part I added was (again, adapted from JD's answer) :

    obj.prettysize <- napply(names, function(x) {
                               print(object.size(x), units = "auto") })
    
    0 讨论(0)
  • 2020-11-22 01:55

    I never save an R workspace. I use import scripts and data scripts and output any especially large data objects that I don't want to recreate often to files. This way I always start with a fresh workspace and don't need to clean out large objects. That is a very nice function though.

    0 讨论(0)
  • 2020-11-22 01:55

    The llfunction in gData package can show the memory usage of each object as well.

    gdata::ll(unit='MB')
    
    0 讨论(0)
  • 2020-11-22 01:56

    I quite like the improved objects function developed by Dirk. Much of the time though, a more basic output with the object name and size is sufficient for me. Here's a simpler function with a similar objective. Memory use can be ordered alphabetically or by size, can be limited to a certain number of objects, and can be ordered ascending or descending. Also, I often work with data that are 1GB+, so the function changes units accordingly.

    showMemoryUse <- function(sort="size", decreasing=FALSE, limit) {
    
      objectList <- ls(parent.frame())
    
      oneKB <- 1024
      oneMB <- 1048576
      oneGB <- 1073741824
    
      memoryUse <- sapply(objectList, function(x) as.numeric(object.size(eval(parse(text=x)))))
    
      memListing <- sapply(memoryUse, function(size) {
            if (size >= oneGB) return(paste(round(size/oneGB,2), "GB"))
            else if (size >= oneMB) return(paste(round(size/oneMB,2), "MB"))
            else if (size >= oneKB) return(paste(round(size/oneKB,2), "kB"))
            else return(paste(size, "bytes"))
          })
    
      memListing <- data.frame(objectName=names(memListing),memorySize=memListing,row.names=NULL)
    
      if (sort=="alphabetical") memListing <- memListing[order(memListing$objectName,decreasing=decreasing),] 
      else memListing <- memListing[order(memoryUse,decreasing=decreasing),] #will run if sort not specified or "size"
    
      if(!missing(limit)) memListing <- memListing[1:limit,]
    
      print(memListing, row.names=FALSE)
      return(invisible(memListing))
    }
    

    And here is some example output:

    > showMemoryUse(decreasing=TRUE, limit=5)
          objectName memorySize
           coherData  713.75 MB
     spec.pgram_mine  149.63 kB
           stoch.reg  145.88 kB
          describeBy    82.5 kB
          lmBandpass   68.41 kB
    
    0 讨论(0)
  • 2020-11-22 01:57

    I make aggressive use of the subset parameter with selection of only the required variables when passing dataframes to the data= argument of regression functions. It does result in some errors if I forget to add variables to both the formula and the select= vector, but it still saves a lot of time due to decreased copying of objects and reduces the memory footprint significantly. Say I have 4 million records with 110 variables (and I do.) Example:

    # library(rms); library(Hmisc) for the cph,and rcs functions
    Mayo.PrCr.rbc.mdl <- 
    cph(formula = Surv(surv.yr, death) ~ age + Sex + nsmkr + rcs(Mayo, 4) + 
                                         rcs(PrCr.rat, 3) +  rbc.cat * Sex, 
         data = subset(set1HLI,  gdlab2 & HIVfinal == "Negative", 
                               select = c("surv.yr", "death", "PrCr.rat", "Mayo", 
                                          "age", "Sex", "nsmkr", "rbc.cat")
       )            )
    

    By way of setting context and the strategy: the gdlab2 variable is a logical vector that was constructed for subjects in a dataset that had all normal or almost normal values for a bunch of laboratory tests and HIVfinal was a character vector that summarized preliminary and confirmatory testing for HIV.

    0 讨论(0)
  • 2020-11-22 01:58

    I love Dirk's .ls.objects() script but I kept squinting to count characters in the size column. So I did some ugly hacks to make it present with pretty formatting for the size:

    .ls.objects <- function (pos = 1, pattern, order.by,
                            decreasing=FALSE, head=FALSE, n=5) {
        napply <- function(names, fn) sapply(names, function(x)
                                             fn(get(x, pos = pos)))
        names <- ls(pos = pos, pattern = pattern)
        obj.class <- napply(names, function(x) as.character(class(x))[1])
        obj.mode <- napply(names, mode)
        obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
        obj.size <- napply(names, object.size)
        obj.prettysize <- sapply(obj.size, function(r) prettyNum(r, big.mark = ",") )
        obj.dim <- t(napply(names, function(x)
                            as.numeric(dim(x))[1:2]))
        vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
        obj.dim[vec, 1] <- napply(names, length)[vec]
        out <- data.frame(obj.type, obj.size,obj.prettysize, obj.dim)
        names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
        if (!missing(order.by))
            out <- out[order(out[[order.by]], decreasing=decreasing), ]
            out <- out[c("Type", "PrettySize", "Rows", "Columns")]
            names(out) <- c("Type", "Size", "Rows", "Columns")
        if (head)
            out <- head(out, n)
        out
    }
    
    0 讨论(0)
提交回复
热议问题