Is there a Python module to open SPSS files?

主宰稳场 提交于 2019-12-17 15:39:40

问题


Is there a module for Python to open IBM SPSS (i.e. .sav) files? It would be great if there's something up-to-date which doesn't require any additional dll files/libraries.


回答1:


I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.

The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.

For example, in order to read a SPSS sav file you would do:

import pyreadstat

df, meta = pyreadstat.read_sav("/path/to/sav/file.sav")

df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.

You can find it here: https://github.com/Roche/pyreadstat




回答2:


Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedman on a related thread might easily be adapted to suit your needs.

Otherwise, Pandas includes a convenient wrapper for rpy2. Here is an example of use with Peat and Barton's weights.sav data set:

>>> import pandas.rpy.common as com
>>> filename = "weights.sav"
>>> w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
>>> w = com.convert_robj(w)
>>> w.head()
     ID  WEIGHT  LENGTH  HEADC  GENDER  EDUCATIO              PARITY
1  L001    3.95    55.5   37.5  Female  tertiary  3 or more siblings
2  L003    4.63    57.0   38.5  Female  tertiary           Singleton
3  L004    4.75    56.0   38.5    Male    year12          2 siblings
4  L005    3.92    56.0   39.0    Male  tertiary         One sibling
5  L006    4.56    55.0   39.5    Male    year10          2 siblings



回答3:


As a note for people findings this later (like me): pandas.rpyhas been deprecated in the newest versions of pandas (>0.16) as noted here. That page includes information on updating code to use the rpy2 interface.




回答4:


But the benefit of using the IBM libraries is that they get this rather complex binary file format right. They are free, relieve you of the burden of writing code for this format, and the license permits you to redistribute them. What more could you ask?




回答5:


Here're packages you probably interested in

  • savReaderWriter on Bitbucket

  • savReaderWriter 3.4.2 in Python Package Index Repo




回答6:


I had the same question as @Pyderman about how to update this for pandas (>0.16). This is what I came up with:

from rpy2.robjects import pandas2ri, r
filename = 'weights.sav'
w = r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
df = pandas2ri.ri2py(w)
df.head()



回答7:


When you have pandas >= 0.25.0 you can now finally just do:

# you need pandas >= 0.25.0 for this    
import pandas as pd
df = pd.read_spss('your_spss_file.sav')

This has library pyreadstat as a requirement, so you might have to install that first:

pip install pyreadstat

I couldn't find documentation on the pd.read_spss() yet, so here's extra info on the parameters of pd.read_spss():

Parameters
----------
path : string or Path
File path

usecols : list-like, optional
Return a subset of the columns. If None, return all columns.

convert_categoricals : bool, default is True
Convert categorical columns into pd.Categorical.

Returns
-------
DataFrame




回答8:


Perhaps you may find this useful: http://code.activestate.com/recipes/577811-python-reader-writer-for-spss-sav-files-linux-mac-/




回答9:


You could use a python interface to R and then import the data using read.spss in library(foreign).



来源:https://stackoverflow.com/questions/14647006/is-there-a-python-module-to-open-spss-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!