I\'ve checked several related questions such is this
How to load data quickly into R?
I\'m quoting specific part of the most rated answer
The main reason why RData files take a while to load is that the de-compression step is single-threaded.
The fastSave R package allows using parallel tools for saving and restoring R sessions:
https://github.com/barkasn/fastSave
But it only works on UNIX (You should still be able to open the files on other platforms though).
save
compresses by default, so it takes extra time to uncompress the file. Then it takes a bit longer to load the larger file into memory. Your pv
example is just copying the compressed data to memory, which isn't very useful to you. ;-)
UPDATE:
I tested my theory and it was incorrect (at least on my Windows XP machine with 3.3Ghz CPU and 7200RPM HDD). Loading compressed files is faster (probably because it reduces disk I/O).
The extra time is spent in RestoreToEnv
(in saveload.c
) and/or R_Unserialize
(in serialize.c
). So you could make loading faster by changing those files, or maybe by using saveRDS
to individually save the objects in myGraph.RData
then somehow using loadRDS
across multiple R processes to load the data into shared memory...
For variables that big, I suspect that most of the time is taken up inside the internal C code (http://svn.r-project.org/R/trunk/src/main/saveload.c). You can run some profiling to see if I'm right. (All the R code in the load
function does is check that your file is non-empty and hasn't been corrupted.
As well as reading the variables into memory, they (amongst other things) need to be stored inside an R environment.
The only obvious way of getting a big speedup in loading variables would be to rewrite the code in a parallel way to allow simultaneous loading of variables. This presumably requires a substantial rewrite of R's internals, so don't hold your breath for such a feature.