问题
I'm looking to work on a SPSS files (.sav) using pandas
. In the absence of the SPSS program, here's what a typical file looks like when converted to .csv:
On investigation into what the first two rows signify (I don't know SPSS), it seems that the first row contains the Label
s, while the second row contains the VarName
s.
When I bring the file into pandas thus:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
w = com.convert_robj(w)
return w
and then do a head(), the first row (Label) is missing:
How can labels be maintained?
- Ref: Is there a Python module to open SPSS files?
- Python: 2.7.10
- Pandas: 0.17.1
回答1:
Labels in a sav
file are stored in variable.labels
attribute of the returning object from the read.spss function.
You can get the variable labels with the following:
import pandas.rpy.common as com
def get_labels(filename):
w = com.robj.r('attr(foreign::read.spss("%s"), "variable.labels")' % filename)
w = com.convert_robj(w)
return w
If you want to set the labels as the column names of your dataframe:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
cols = list(com.robj.r("attr")(w, "variable.labels"))
w = com.convert_robj(w)
w.columns = cols
return w
来源:https://stackoverflow.com/questions/36287936/how-to-preserve-labels-when-spss-file-sav-imported-into-pandas-via-rpy