Speed up RData load

后端未结

关注

 3  1193

清酒与你

I\'ve checked several related questions such is this

How to load data quickly into R?

I\'m quoting specific part of the most rated answer

相关标签:

3条回答

暖寄归人

2020-12-23 21:34

The main reason why RData files take a while to load is that the de-compression step is single-threaded.

The fastSave R package allows using parallel tools for saving and restoring R sessions:

https://github.com/barkasn/fastSave

But it only works on UNIX (You should still be able to open the files on other platforms though).

0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2020-12-23 21:40

save compresses by default, so it takes extra time to uncompress the file. Then it takes a bit longer to load the larger file into memory. Your pv example is just copying the compressed data to memory, which isn't very useful to you. ;-)

UPDATE:

I tested my theory and it was incorrect (at least on my Windows XP machine with 3.3Ghz CPU and 7200RPM HDD). Loading compressed files is faster (probably because it reduces disk I/O).

The extra time is spent in RestoreToEnv (in saveload.c) and/or R_Unserialize (in serialize.c). So you could make loading faster by changing those files, or maybe by using saveRDS to individually save the objects in myGraph.RData then somehow using loadRDS across multiple R processes to load the data into shared memory...

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-12-23 21:41

For variables that big, I suspect that most of the time is taken up inside the internal C code (http://svn.r-project.org/R/trunk/src/main/saveload.c). You can run some profiling to see if I'm right. (All the R code in the load function does is check that your file is non-empty and hasn't been corrupted.

As well as reading the variables into memory, they (amongst other things) need to be stored inside an R environment.

The only obvious way of getting a big speedup in loading variables would be to rewrite the code in a parallel way to allow simultaneous loading of variables. This presumably requires a substantial rewrite of R's internals, so don't hold your breath for such a feature.

0 讨论(0)
发布评论:

提交评论
- 加载中...