问题
I am writing a python program that uses other software. I was able to pass the command using subprocess.popen
. I am facing a new problem: I need to concatenate multiples files as two
files and use them as the input for the external program. The command line looks like this:
extersoftware --fq --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)
I cannot use shell=True
because there are other commands I need to pass by variables, such as --fq
.(They are not limited to --fq, here is just an example)
One possible solution is to generate middle file. This is what I have tried:
file_1 = ['cat', 'fileA_1', 'fileB_1']
p1 = Popen(file_1, stdout=PIPE)
p2 = Popen(['>', 'output_file'], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()
output = p2.communicate()
print output
I got error message: OSError: [Errno 2] No such file or directory
Which part did I do wrong?
It would be better if there is no middle file. For this reason, I am looking at named pipe. I do not quiet understand it.
I have looked at multiple questions that have been answered here. To me they are all some how different from my question here. Thanks ahead for all your help.
回答1:
The way bash handles <(..)
is to:
- Create a pipe
- Fork a command that writes to the write end
- Substitute the
<(..)
for /dev/fd/N where N is the input end file descriptor of the pipe (tryecho <(true)
). - Run the command
The command will then open /dev/fd/N
, and the OS will cause that to duplicate the inherited read end of the pipe.
We can do the same thing in Python:
import subprocess
import os
# Open a pipe and run a command that writes to the write end
input_fd, output_fd = os.pipe()
subprocess.Popen(["cat", "foo.txt", "bar.txt"], shell=False, stdout=output_fd)
os.close(output_fd);
# Run a command that uses /dev/fd/* to read from the read end
proc = subprocess.Popen(["wc", "/dev/fd/" + str(input_fd)],
shell=False, stdout = subprocess.PIPE)
# Read that command's output
print proc.communicate()[0]
For example:
$ cat foo.txt
Hello
$ cat bar.txt
World
$ wc <(cat foo.txt bar.txt)
2 2 12 /dev/fd/63
$ python test.py
2 2 12 /dev/fd/4
回答2:
Process substitution returns the device filename that is being used. You will have to assign the pipe to a higher FD (e.g. 20) by passing a function to preexec_fn
that uses os.dup2()
to copy it, and then pass the FD device filename (e.g. /dev/fd/20
) as one of the arguments of the call.
def assignfd(fd, handle):
def assign():
os.dup2(handle, fd)
return assign
...
p2 = Popen(['cat', '/dev/fd/20'], preexec_fn=assignfd(20, p1.stdout.fileno()))
...
回答3:
It's actually possible have it both ways -- using a shell, while passing a list of arguments through unambiguously in a way that doesn't allow them to be shell-parsed.
Use bash
explicitly rather than shell=True
to ensure that you have support for <()
, and use "$@"
to refer to the additional argv array elements, like so:
subprocess.Popen(['bash', '-c',
'extersoftware "$@" --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)',
"_", # this is a dummy passed in as argv[0] of the interpreter
"--fq", # this is substituted into the shell by the "$@"
])
If you wanted to independently pass in all three arrays -- extra arguments, and the exact filenames to be passed to each cat
instance:
BASH_SCRIPT=r'''
declare -a filelist1=( )
filelist1_len=$1; shift
while (( filelist1_len-- > 0 )); do
filelist1+=( "$1" ); shift
done
filelist2_len=$1; shift
while (( filelist2_len-- > 0 )); do
filelist2+=( "$1" ); shift
done
extersoftware "$@" --f <(cat "${filelist1[@]}") <(cat "${filelist2[@]}")
'''
subprocess.Popen(['bash', '-c', BASH_SCRIPT, '' +
[str(len(filelist1))] + filelist1 +
[str(len(filelist2))] + filelist2 +
["--fq"],
])
You could put more interesting logic in the embedded shell script as well, were you so inclined.
回答4:
In this specific case, we may use:
import subprocess
import os
if __name__ == '__main__':
input_fd1, output_fd1 = os.pipe()
subprocess.Popen(['cat', 'fileA_1', 'fileB_1'],
shell=False, stdout=output_fd1)
os.close(output_fd1)
input_fd2, output_fd2 = os.pipe();
subprocess.Popen(['cat', 'fileA_2', 'fileB_2'],
shell=False, stdout=output_fd2)
os.close(output_fd2)
proc = subprocess.Popen(['extersoftware','--fq', '--f',
'/dev/fd/'+str(input_fd1), '/dev/fd/' + str(input_fd2)], shell=False)
Change log:
Reformatted the code so it should be easier to read now (and hopefully still syntactically correct). It's tested in Python 2.6.6 on Scientific Linux 6.5 and everything looks fine.
Removed unnecessary semicolons.
来源:https://stackoverflow.com/questions/26593229/how-to-execute-cat-filea-fileb-using-python