Tricks to manage the available memory in an R session

前端 未结 27 1364
情深已故
情深已故 2020-11-22 01:23

What tricks do people use to manage the available memory of an interactive R session? I use the functions below [based on postings by Petr Pikal and David Hinds to the r-he

相关标签:
27条回答
  • 2020-11-22 01:58

    For both speed and memory purposes, when building a large data frame via some complex series of steps, I'll periodically flush it (the in-progress data set being built) to disk, appending to anything that came before, and then restart it. This way the intermediate steps are only working on smallish data frames (which is good as, e.g., rbind slows down considerably with larger objects). The entire data set can be read back in at the end of the process, when all the intermediate objects have been removed.

    dfinal <- NULL
    first <- TRUE
    tempfile <- "dfinal_temp.csv"
    for( i in bigloop ) {
        if( !i %% 10000 ) { 
            print( i, "; flushing to disk..." )
            write.table( dfinal, file=tempfile, append=!first, col.names=first )
            first <- FALSE
            dfinal <- NULL   # nuke it
        }
    
        # ... complex operations here that add data to 'dfinal' data frame  
    }
    print( "Loop done; flushing to disk and re-reading entire data set..." )
    write.table( dfinal, file=tempfile, append=TRUE, col.names=FALSE )
    dfinal <- read.table( tempfile )
    
    0 讨论(0)
  • 2020-11-22 01:59

    The use of environments instead of lists to handle collections of objects which occupy a significant amount of working memory.

    The reason: each time an element of a list structure is modified, the whole list is temporarily duplicated. This becomes an issue if the storage requirement of the list is about half the available working memory, because then data has to be swapped to the slow hard disk. Environments, on the other hand, aren't subject to this behaviour and they can be treated similar to lists.

    Here is an example:

    get.data <- function(x)
    {
      # get some data based on x
      return(paste("data from",x))
    }
    
    collect.data <- function(i,x,env)
    {
      # get some data
      data <- get.data(x[[i]])
      # store data into environment
      element.name <- paste("V",i,sep="")
      env[[element.name]] <- data
      return(NULL)  
    }
    
    better.list <- new.env()
    filenames <- c("file1","file2","file3")
    lapply(seq_along(filenames),collect.data,x=filenames,env=better.list)
    
    # read/write access
    print(better.list[["V1"]])
    better.list[["V2"]] <- "testdata"
    # number of list elements
    length(ls(better.list))
    

    In conjunction with structures such as big.matrix or data.table which allow for altering their content in-place, very efficient memory usage can be achieved.

    0 讨论(0)
  • 2020-11-22 01:59

    This is a newer answer to this excellent old question. From Hadley's Advanced R:

    install.packages("pryr")
    
    library(pryr)
    
    object_size(1:10)
    ## 88 B
    
    object_size(mean)
    ## 832 B
    
    object_size(mtcars)
    ## 6.74 kB
    

    (http://adv-r.had.co.nz/memory.html)

    0 讨论(0)
  • 2020-11-22 02:01

    This adds nothing to the above, but is written in the simple and heavily commented style that I like. It yields a table with the objects ordered in size , but without some of the detail given in the examples above:

    #Find the objects       
    MemoryObjects = ls()    
    #Create an array
    MemoryAssessmentTable=array(NA,dim=c(length(MemoryObjects),2))
    #Name the columns
    colnames(MemoryAssessmentTable)=c("object","bytes")
    #Define the first column as the objects
    MemoryAssessmentTable[,1]=MemoryObjects
    #Define a function to determine size        
    MemoryAssessmentFunction=function(x){object.size(get(x))}
    #Apply the function to the objects
    MemoryAssessmentTable[,2]=t(t(sapply(MemoryAssessmentTable[,1],MemoryAssessmentFunction)))
    #Produce a table with the largest objects first
    noquote(MemoryAssessmentTable[rev(order(as.numeric(MemoryAssessmentTable[,2]))),])
    
    0 讨论(0)
  • 2020-11-22 02:01

    Based on @Dirk's and @Tony's answer I have made a slight update. The result was outputting [1] before the pretty size values, so I took out the capture.output which solved the problem:

    .ls.objects <- function (pos = 1, pattern, order.by,
                         decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
        fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.prettysize <- napply(names, function(x) {
        format(utils::object.size(x),  units = "auto") })
    obj.size <- napply(names, utils::object.size)
    
    obj.dim <- t(napply(names, function(x)
        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    
    return(out)
    }
    
    # shorthand
    lsos <- function(..., n=10) {
        .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
    }
    
    lsos()
    
    0 讨论(0)
  • 2020-11-22 02:02

    Just to note that data.table package's tables() seems to be a pretty good replacement for Dirk's .ls.objects() custom function (detailed in earlier answers), although just for data.frames/tables and not e.g. matrices, arrays, lists.

    0 讨论(0)
提交回复
热议问题