Extend memory size limit in R

后端 未结 4 673
有刺的猬
有刺的猬 2020-12-18 13:55

I have a R program that combines 10 files each file is of size 296MB and I have increased the memory size to 8GB (Size of RAM)

--max-mem-size=8192M


        
相关标签:
4条回答
  • 2020-12-18 14:19

    Memory allocation needs contiguous blocks. The size taken by the file on disk may not be a good index of how large the object is when loaded into R. Can you look at one of these S files with the function:

    ?object.size
    

    Here is a function I use to see what is taking up the most space in R:

    getsizes <- function() {z <- sapply(ls(envir=globalenv()), 
                                    function(x) object.size(get(x)))
                   (tmp <- as.matrix(rev(sort(z))[1:10]))}
    
    0 讨论(0)
  • 2020-12-18 14:23

    If this files are in standard format and you want to do this in R then why bother read/write csv. Use readLines/writeLines:

    files_in <- file.path("C:/Sim_Omega3_results",c(
        "sim_omega3_1_400.txt",
        "sim_omega3_401_800.txt",
        "sim_omega3_801_1200.txt",
        "sim_omega3_1201_1600.txt",
        "sim_omega3_1601_2000.txt",
        "sim_omega3_2001_2400.txt",
        "sim_omega3_2401_2800.txt",
        "sim_omega3_2801_3200.txt",
        "sim_omega3_3201_3600.txt",
        "sim_omega3_3601_4000.txt"))
    
    
    file.copy(files_in[1], out_file_name <- "C:/sim_omega3_1_4000.txt")
    file_out <- file(out_file_name, "at")
    for (file_in in files_in[-1]) {
        x <- readLines(file_in)
        writeLines(x[-1], file_out)
    }
    close(file_out)
    
    0 讨论(0)
  • 2020-12-18 14:24

    If you remove(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10) then gc() after calculating combine_result, you might free enough memory. I also find that running it through RScript seems to allows access to more memory than through the GUI if you are on Windows.

    0 讨论(0)
  • 2020-12-18 14:25

    I suggest incorporating the suggestions in ?read.csv2:

    Memory usage:

     These functions can use a surprising amount of memory when reading
     large files.  There is extensive discussion in the ‘R Data
     Import/Export’ manual, supplementing the notes here.
    
     Less memory will be used if ‘colClasses’ is specified as one of
     the six atomic vector classes.  This can be particularly so when
     reading a column that takes many distinct numeric values, as
     storing each distinct value as a character string can take up to
     14 times as much memory as storing it as an integer.
    
     Using ‘nrows’, even as a mild over-estimate, will help memory
     usage.
    
     Using ‘comment.char = ""’ will be appreciably faster than the
     ‘read.table’ default.
    
     ‘read.table’ is not the right tool for reading large matrices,
     especially those with many columns: it is designed to read _data
     frames_ which may have columns of very different classes.  Use
     ‘scan’ instead for matrices.
    
    0 讨论(0)
提交回复
热议问题