R: selecting subset without copying

允我心安 提交于 2019-12-29 04:02:31

问题


Is there a way to select a subset from objects (data frames, matrices, vectors) without making a copy of selected data?

I work with quite large data sets, but never change them. However often for convenience I select subsets of the data to operate on. Making a copy of a large subset each time is very memory inefficient, but both normal indexing and subset (and thus xapply() family of functions) create copies of selected data. So I'm looking for functions or data structures that can overcome this issue.

Some possible approaches that may fit my needs and hopefully are implemented in some R packages:

  • copy-on-write mechanism, i.e. data structures that are copied only when you add or rewrite existing elements;
  • immutable data structures, that only require recreating indexing information for the data structure, but not its content (like making substring from the string by only creating small object that holds length and a pointer to the same char array);
  • xapply() analogues that do not create subsets.

回答1:


Try package ref. Specifically, its refdata class.

What you might be missing about data.table is that when grouping (by= parameter) the subsets of data are not copied, so that's fast. [Well technically they are but into a shared area of memory which is reused for each group, and copied using memcpy which is much faster than R's for loops in C.]

:= in data.table is one way to modify a data.table in place. data.table departs from usual R programming style in that it is not copied-on-write. User has to call copy() explicitly to copy a (potentially very large) table, even within a function.

You're right that there isn't a mechanism like refdata built into data.table. I see what you mean and it would be a nice feature. refdata should work on a data.table, though, and you might be fine with data.frame (but be sure to monitor copies with tracemem(DF)).

There is also idata.frame (immutable data.frame) in package plyr you could try.



来源:https://stackoverflow.com/questions/9573055/r-selecting-subset-without-copying

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!