I am trying to format the following awk command
awk -v OFS=\"\\t\" \'{printf \"chr%s\\t%s\\t%s\\n\", $1, $2-1, $2}\' file1.txt > file2.txt
The simplest method, especially if you wish to keep the output redirection stuff, is to use subprocess
with shell=True - then you only need to escape Python special characters. The line, as a whole, will be interpreted by the default shell.
Alternatively, you can replace the command line with an argv
-type sequence and feed that to subprocess
instead. Then, you need to provide stuff as the program would see it:
Regarding the specific problems:
\t
and \n
became the literal tab and newline (try to print awk_command
)using shlex.split
is nothing different from shell=True
- with an added unreliability since it cannot guarantee if would parse the string the same way your shell would in every case (not to mention the lack of transmutations the shell makes).
Specifically, it doesn't know or care about the special meaning of the redirection part:
>>> awk_command = """awk -v OFS="\\t" '{printf "chr%s\\t%s\\t%s\\n", $1, $2- 1, $2}' file1.txt > file2.txt"""
>>> shlex.split(awk_command)
['awk','-v','OFS=\\t','{printf "chr%s\\t%s\\t%s\\n", $1, $2-1, $2}','file1.txt','>','file2.txt']
So, if you wish to use shell=False
, do construct the argument list yourself.
>
is the shell redirection operator. To implement it in Python, use stdout
parameter:
#!/usr/bin/env python
import shlex
import subprocess
cmd = r"""awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}'"""
with open('file2.txt', 'wb', 0) as output_file:
subprocess.check_call(shlex.split(cmd) + ["file1.txt"], stdout=output_file)
To avoid starting a separate process, you could implement this particular awk
command in pure Python.