问题
I need to read a csv file with python (into a pandas dataframe), work in R and return to python. Then, to pass pandas dataframe to R dataframe I use rpy2, and work ok (code bellow).
from pandas import read_csv, DataFrame
import pandas.rpy.common as com
import rpy2.robjects as robjects
r = robjects.r
r.library("fitdistrplus")
df = read_csv('./datos.csv')
r_df = com.convert_to_r_dataframe(df)
print(type(r_df))
And this output is:
<class 'rpy2.robjects.vectors.FloatVector'>
But then, I try to make a fit in R:
fit2 = r.fitdist(r_df, "weibull")
But I have this error:
RRuntimeError: Error in (function (data, distr, method = c("mle", "mme", "qme", "mge"), :
data must be a numeric vector of length greater than 1
I have 2nd question in this:
1_ What I do wrong?
2_ This is the most efficient way to pass a python dataframe to R? Because, I see this import: from rpy2.robjects.packages import importr
This is the data that I read: https://mega.co.nz/#!P8MEDSzQ!iQyxt73a5pRvJNOxWeSEaFlsVS7_A1sZCAXkUFBLJa0
I use Ipython 2.1 Thanks!
回答1:
You have two issues:
First, you are trying to use a data frame where you really need a vector. (If you tried using an R data.frame
for fitdist()
, you'd also get an error.)
Second, the pandas<->rpy2 support provided by pandas is buggy, resulting in conversion of your (presumably) numeric pandas data frame to a string/character R data frame:
In [27]: r.sapply(r_df, r["class"])
Out[27]:
<StrVector - Python:0x1097757a0 / R:0x7fa41c6b0b68>
[str, str, str, str]
This is not good! The following code fixes these errors:
from pandas import read_csv
import rpy2.robjects as robjects
r = robjects.r
r.library("fitdistrplus")
# this will read in your csv file as a Series, rather than a DataFrame
series = read_csv('datos.csv', index_col=0, squeeze=True)
# do the conversion directly, so that we get an R Vector, rather than a
# data frame, and we know that it's a numeric type
r_vec = robjects.FloatVector(series)
fit2 = r.fitdist(r_vec, "weibull")
回答2:
I haven't try your data, but something like this should work.
%load_ext rmagic
from pandas import read_csv
from rpy2.robjects.packages import importr
# That import alone is sufficient to switch an automatic
# conversion of numpy objects into rpy2 objects.
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
f = importr('fitdistrplus')
dfp = read_csv('./test.csv')
f1 = f.fitdist(dfp.as_matrix(), "weibull")
print f1
来源:https://stackoverflow.com/questions/25800556/rpy2-pandas-dataframe-cant-fit-in-r