Speed up RData load

后端 未结 3 1190
清酒与你
清酒与你 2020-12-23 21:15

I\'ve checked several related questions such is this

How to load data quickly into R?

I\'m quoting specific part of the most rated answer

相关标签:
3条回答
  • 2020-12-23 21:34

    The main reason why RData files take a while to load is that the de-compression step is single-threaded.

    The fastSave R package allows using parallel tools for saving and restoring R sessions:

    https://github.com/barkasn/fastSave

    But it only works on UNIX (You should still be able to open the files on other platforms though).

    0 讨论(0)
  • 2020-12-23 21:40

    save compresses by default, so it takes extra time to uncompress the file. Then it takes a bit longer to load the larger file into memory. Your pv example is just copying the compressed data to memory, which isn't very useful to you. ;-)

    UPDATE:

    I tested my theory and it was incorrect (at least on my Windows XP machine with 3.3Ghz CPU and 7200RPM HDD). Loading compressed files is faster (probably because it reduces disk I/O).

    The extra time is spent in RestoreToEnv (in saveload.c) and/or R_Unserialize (in serialize.c). So you could make loading faster by changing those files, or maybe by using saveRDS to individually save the objects in myGraph.RData then somehow using loadRDS across multiple R processes to load the data into shared memory...

    0 讨论(0)
  • 2020-12-23 21:41

    For variables that big, I suspect that most of the time is taken up inside the internal C code (http://svn.r-project.org/R/trunk/src/main/saveload.c). You can run some profiling to see if I'm right. (All the R code in the load function does is check that your file is non-empty and hasn't been corrupted.

    As well as reading the variables into memory, they (amongst other things) need to be stored inside an R environment.

    The only obvious way of getting a big speedup in loading variables would be to rewrite the code in a parallel way to allow simultaneous loading of variables. This presumably requires a substantial rewrite of R's internals, so don't hold your breath for such a feature.

    0 讨论(0)
提交回复
热议问题