I\'m looking to automate the process of converting many .CSV files into .DTA files via Python. .DTA files is the filetype that is handled by the Stata Statistics language.
(copypasting from my answer to a previous question)
pandas DataFrame objects now have a "to_stata" method. So you can do for instance
import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')
DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). Spacedman's answer may look less elegant, but it is probably more efficient.
You need rpy2 for Python and also the foreign
package installed in R. You do that by starting R and typing install.packages("foreign")
. You can then quit R and go back to Python.
Then this:
import rpy2.robjects as robjects
robjects.r("require(foreign)")
robjects.r('x=read.csv("test.csv")')
robjects.r('write.dta(x,"test.dta")')
You can construct the string passed to robjects.r
from Python variables if you want, something like:
robjects.r('x=read.csv("%s")' % fileName)