I have a Python
script that cleans up and performs basic statistical calculations on a large panel dataset (2,000,000+ observations
).
I find t
This answer extends @Roberto Ferrer's answer, solving a few issues I ran into.
Stata in system path
For stata
to run code, it must be correctly set up in the system path (on Windows at least). At least for me, this was not automatically set up on installing Stata, and i found the simplest correction was to put in the full path (which for me was "C:\Program Files (x86)\Stata12\Stata-64
) i.e.:
cmd = ["C:\Program Files (x86)\Stata12\Stata-64","do", dofile]`
How to quietly run the code in the background
It is possible to get the code to run quietly in the background (i.e. not opening up Stata each time), by adding the command /e
i.e.
cmd = ["C:\Program Files (x86)\Stata12\Stata-64,"/e","do", dofile]
Log file storage location
Finally, if you are running quietly in the background, Stata will will want to save log files. It will do this in cmd
's working directory. This must vary depending on where the code is being run from, but for me, since i was executing Python from Notepad++, it wanted to save the log files in C:\Program Files (x86)\Notepad++
, which Stata did not have write-access to. This can be changed by specifying the working directory when the sub-process is called.
These modifications to Roberto Ferrer
's code lead to:
def dostata(dofile, *params):
cmd = ["C:\Program Files (x86)\Stata12\Stata-64","/e","do", dofile]
for param in params:
cmd.append(param)
return (subprocess.call(cmd, cwd=r'C:\location_to_save_log_files'))
If you're running this in a command-line setting, you should be able to call Stata from the command line from python (I don't know how to invoke a shell command from within Python, but it shouldn't be too hard, see here: Calling an external command in Python). To run Stata from the command line (aka batch mode), see here: http://www.stata.com/support/faqs/unix/batch-mode/
I think @user229552 points in the correct direction. Python's subprocess
module can be used. Below an example that works for me with Linux OS.
Suppose you have a Python file called pydo.py
with the following:
import subprocess
## Do some processing in Python
## Set do-file information
dofile = "/home/roberto/Desktop/pyexample3.do"
cmd = ["stata", "do", dofile, "mpg", "weight", "foreign"]
## Run do-file
subprocess.call(cmd)
and a Stata do-file named pyexample3.do
, with the following:
clear all
set more off
local y `1'
local x1 `2'
local x2 `3'
display `"first parameter: `y'"'
display `"second parameter: `x1'"'
display `"third parameter: `x2'"'
sysuse auto
regress `y' `x1' `x2'
exit, STATA clear
Then executing pydo.py
in a Terminal window works as expected.
You could also define a Python function and use that:
## Define a Python function to launch a do-file
def dostata(dofile, *params):
## Launch a do-file, given the fullpath to the do-file
## and a list of parameters.
import subprocess
cmd = ["stata", "do", dofile]
for param in params:
cmd.append(param)
return subprocess.call(cmd)
## Do some processing in Python
## Run a do-file
dostata("/home/roberto/Desktop/pyexample3.do", "mpg", "weight", "foreign")
The complete call from a Terminal, with results:
roberto@roberto-mint ~/Desktop
$ python pydo.py
___ ____ ____ ____ ____ (R)
/__ / ____/ / ____/
___/ / /___/ / /___/ 12.1 Copyright 1985-2011 StataCorp LP
Statistics/Data Analysis StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC http://www.stata.com
979-696-4600 stata@stata.com
979-696-4601 (fax)
Notes:
1. Command line editing enabled
. do /home/roberto/Desktop/pyexample3.do mpg weight foreign
. clear all
. set more off
.
. local y `1'
. local x1 `2'
. local x2 `3'
.
. display `"first parameter: `y'"'
first parameter: mpg
. display `"second parameter: `x1'"'
second parameter: weight
. display `"third parameter: `x2'"'
third parameter: foreign
.
. sysuse auto
(1978 Automobile Data)
. regress `y' `x1' `x2'
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 69.75
Model | 1619.2877 2 809.643849 Prob > F = 0.0000
Residual | 824.171761 71 11.608053 R-squared = 0.6627
-------------+------------------------------ Adj R-squared = 0.6532
Total | 2443.45946 73 33.4720474 Root MSE = 3.4071
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0065879 .0006371 -10.34 0.000 -.0078583 -.0053175
foreign | -1.650029 1.075994 -1.53 0.130 -3.7955 .4954422
_cons | 41.6797 2.165547 19.25 0.000 37.36172 45.99768
------------------------------------------------------------------------------
.
. exit, STATA clear
Sources:
http://www.reddmetrics.com/2011/07/15/calling-stata-from-python.html
http://docs.python.org/2/library/subprocess.html
http://www.stata.com/support/faqs/unix/batch-mode/
A different route for using Python and Stata together can be found at
http://ideas.repec.org/c/boc/bocode/s457688.html
http://www.stata.com/statalist/archive/2013-08/msg01304.html