Loading .RData files into Python

前端 未结 5 1476
面向向阳花
面向向阳花 2020-12-02 20:14

I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any id

相关标签:
5条回答
  • 2020-12-02 20:37

    Jupyter Notebook Users

    If you are using Jupyter notebook, you need to do 2 steps:

    Step 1: go to http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2 and download Python interface to the R language (embedded R) in my case I will use rpy2-2.8.6-cp36-cp36m-win_amd64.whl

    Put this file in the same working directory you are currently in.

    Step 2: Go to your Jupyter notebook and write the following commands

    # This is to install rpy2 library in Anaconda
    !pip install rpy2-2.8.6-cp36-cp36m-win_amd64.whl
    

    and then

    # This is important if you will be using rpy2
    import os
    os.environ['R_USER'] = 'D:\Anaconda3\Lib\site-packages\rpy2'
    

    and then

    import rpy2.robjects as robjects
    from rpy2.robjects import pandas2ri
    pandas2ri.activate()
    

    This should allow you to use R functions in python. Now you have to import the readRDS as follow

    readRDS = robjects.r['readRDS']
    df = readRDS('Data1.rds')
    df = pandas2ri.ri2py(df)
    df.head()
    

    Congratulations! now you have the Dataframe you wanted

    However, I advise you to save it in pickle file for later time usage in python as

     df.to_pickle('Data1') 
    

    So next time you may simply use it by

    df1=pd.read_pickle('Data1')
    
    0 讨论(0)
  • 2020-12-02 20:39

    Well, I couple years ago I had the same problem as you. I wanted to read .RData files from a library that I was developing. I considered using RPy2, but that would have forced me to release my library with a GPL license, which I did not want to do.

    "pyreadr" didn't even exist then. Also, the datasets which I wanted to load were not in a standardized format as a data.frame.

    I came to this question and read Spacedman answer. In particular, I saw the line

    So any other implementation in any other language is hard++.

    as a challenge, and implemented the package rdata in a couple of days as a result. This is a very small pure Python implementation of a .RData parser and converter, able to suit my needs until now. The steps of parsing the original objects and converting to apropriate Python objects are separated, so that users could use a different conversion if they want. Moreover, users can add constructors for custom R classes.

    This is an usage example:

    >>> import rdata
    
    >>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_vector.rda")
    >>> converted = rdata.conversion.convert(parsed)
    >>> converted
    {'test_vector': array([1., 2., 3.])}
    

    As I said, I developed this package and have been used since without problems, but I did not bother to give it visibility as I did not document it properly. This has recently changed and now the documentation is mostly ok, so here it is for anyone interested:

    https://github.com/vnmabus/rdata

    0 讨论(0)
  • 2020-12-02 20:43

    There is a third party library called rpy, and you can use this library to load .RData files. You can get this via a pip install pip instally rpy will do the trick, if you don't have rpy, then I suggest that you take a look at how to install it. Otherwise, you can simple do:

    from rpy import *
    r.load("file name here")
    

    EDIT:

    It seems like I'm a little old school there,s rpy2 now, so you can use that.

    0 讨论(0)
  • 2020-12-02 20:59

    As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.

    It is a wrapper around the C library librdata, so it is very fast.

    You can install it easily with pip:

    pip install pyreadr
    

    As an example you would do:

    import pyreadr
    
    result = pyreadr.read_r('/path/to/file.RData') # also works for Rds
    
    # done! let's see what we got
    # result is a dictionary where keys are the name of objects and the values python
    # objects
    print(result.keys()) # let's check what objects we got
    df1 = result["df1"] # extract the pandas data frame for object df1
    

    The repo is here: https://github.com/ofajardo/pyreadr

    Disclaimer: I am the developer of this package.

    0 讨论(0)
  • 2020-12-02 21:04

    People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData file format. So any other implementation in any other language is hard++.

    I think the only reasonable way is to install RPy2 and use R's load function from that, converting to appropriate python objects as you go. The .RData file can contain structured objects as well as plain tables so watch out.

    Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/

    Quicky:

    >>> import rpy2.robjects as robjects
    >>> robjects.r['load'](".RData")
    

    objects are now loaded into the R workspace.

    >>> robjects.r['y']
    <FloatVector - Python:0x24c6560 / R:0xf1f0e0>
    [0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]
    

    That's a simple scalar, d is a data frame, I can subset to get columns:

    >>> robjects.r['d'][0]
    <IntVector - Python:0x24c9248 / R:0xbbc6c0>
    [       1,        2,        3, ...,        8,        9,       10]
    >>> robjects.r['d'][1]
    <FloatVector - Python:0x24c93b0 / R:0xf1f230>
    [0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]
    
    0 讨论(0)
提交回复
热议问题