Save .dta files in python

后端 未结 3 600
無奈伤痛
無奈伤痛 2020-12-28 20:24

I\'m wondering if anyone knows a Python package that allows you to save numpy arrays/recarrays in the .dta format of the statistical data analysis software Stat

相关标签:
3条回答
  • 2020-12-28 20:35

    pandas DataFrame objects now have a "to_stata" method. So you can do for instance

    import pandas as pd
    df = pd.read_stata('my_data_in.dta')
    df.to_stata('my_data_out.dta')
    

    DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). This answer may look less elegant, but it is probably more efficient.

    0 讨论(0)
  • 2020-12-28 20:40

    The only Python library for STATA interoperability I could find merely provides read-only access to .dta files. The R foreign library however provides a function write.dta, and RPy provides a Python interface to R. Maybe the combination of these tools can help you.

    0 讨论(0)
  • 2020-12-28 20:42

    The scikits.statsmodels package includes a reader for Stata data files, which relies in part on PyDTA as pointed out by @Sven. In particular, genfromdta() will return an ndarray, e.g. from Python 2.7/statsmodels 0.3.1:

    >>> import scikits.statsmodels.api as sm
    >>> arr = sm.iolib.genfromdta('/Applications/Stata12/auto.dta')
    >>> type(arr)
    <type 'numpy.ndarray'>
    

    The savetxt() function can be used in turn to save an array as a text file, which can be imported in Stata. For example, we can export the above as

    >>> sm.iolib.savetxt('auto.txt', arr, fmt='%2s', delimiter=",")
    

    and read it in Stata without a dictionary file as follows:

    . insheet using auto.txt, clear
    

    I believe a *.dta reader should be added in the near future.

    0 讨论(0)
提交回复
热议问题