import rpy2.robjects as robjects
dffunc = sc.parallelize([(0,robjects.r.rnorm),(1,robjects.r.runif)])
dffunc.collect()
Outputs
[
I'd like to say that "this is not a bug in rpy2, this is a feature" but I'll realistically have to settle with "this is a limitation".
What is happening is that rpy2 has 2 interface levels. One is a low-level one (closer to R's C API) and available through rpy2.rinterface
and the other one is a high-level interface with more bells and whistles, more "pythonic", and with classes for R objects inheriting from rinterface
level-ones (that last part is important for the part about pickling below). Importing the high-level interface results in initializing (starting) the embedded R with default parameters if necessary. Importing the low-level interface rinterface
does not have this side effect and the initialization of the embedded R must be performed explicitly (function initr
). rpy2 was designed this way because the initialization of the embedded R can have parameters: importing first rpy2.rinterface
, setting the initialization, then importing rpy2.robjects
makes this possible.
In addition to that the serialization (pickling) of R objects wrapped by rpy2 is currently only defined at the rinterface
level (see the documentation). Pickling robjects
-level (high-level) rpy2 objects is using the rinterface
-level code and when unpickling them they will remain at that lower-level (the Python pickle contains the module the class of the object is defined in and will import that module - here rinterface
, which does not imply the initialization of the embedded R). The reason for things being this way are simply that it was "good enough for now": at the time this was implemented I had to simultaneously think of a good way to bridge two somewhat different languages and learn my way through Python C-API and pickling/unpickling Python objects. Given the ease with which one can write something like
import rpy2.robjects
or
import rpy2.rinterface
rpy2.rinterface.initr()
before unpickling, this was never revisited. The uses of rpy2's pickling I know about are using Python's multiprocessing
(and adding something similar to the import statements in the code initializing a child process was a cheap and sufficient fix). May this is the time to look at this again. File a bug report for rpy2 if the case.
edit: this is undoubtedly an issue with rpy2. pickled robjects
-level objects should unpickle back to robjects
-level, not rinterface
-level. I have opened an issue in the rpy2 tracker (and already pushed a rudimentary patch in the default/dev branch).
2nd edit: The patch is part of released rpy2 starting with version 2.7.7 (latest release at the time of writing is 2.7.8).