How to execute '<(cat fileA fileB)' using python?

*爱你&永不变心* 提交于 2021-02-15 07:30:48

问题


I am writing a python program that uses other software. I was able to pass the command using subprocess.popen. I am facing a new problem: I need to concatenate multiples files as two files and use them as the input for the external program. The command line looks like this:

extersoftware --fq --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)

I cannot use shell=True because there are other commands I need to pass by variables, such as --fq.(They are not limited to --fq, here is just an example)

One possible solution is to generate middle file. This is what I have tried:

file_1 = ['cat', 'fileA_1', 'fileB_1']
p1 = Popen(file_1, stdout=PIPE)
p2 = Popen(['>', 'output_file'], stdin=p1.stdout, stdout=PIPE)

p1.stdout.close()
output = p2.communicate()
print output

I got error message: OSError: [Errno 2] No such file or directory Which part did I do wrong?

It would be better if there is no middle file. For this reason, I am looking at named pipe. I do not quiet understand it.

I have looked at multiple questions that have been answered here. To me they are all some how different from my question here. Thanks ahead for all your help.


回答1:


The way bash handles <(..) is to:

  1. Create a pipe
  2. Fork a command that writes to the write end
  3. Substitute the <(..) for /dev/fd/N where N is the input end file descriptor of the pipe (try echo <(true)).
  4. Run the command

The command will then open /dev/fd/N, and the OS will cause that to duplicate the inherited read end of the pipe.

We can do the same thing in Python:

import subprocess                                                            
import os                                                                    

# Open a pipe and run a command that writes to the write end                 
input_fd, output_fd = os.pipe()                                              
subprocess.Popen(["cat", "foo.txt", "bar.txt"], shell=False, stdout=output_fd)
os.close(output_fd);                                                         

# Run a command that uses /dev/fd/* to read from the read end                
proc = subprocess.Popen(["wc", "/dev/fd/" + str(input_fd)],                  
                        shell=False, stdout = subprocess.PIPE)               

# Read that command's output                                                 
print proc.communicate()[0]   

For example:

$ cat foo.txt 
Hello

$ cat bar.txt 
World

$ wc <(cat foo.txt bar.txt)
      2       2      12 /dev/fd/63

$ python test.py
      2       2      12 /dev/fd/4



回答2:


Process substitution returns the device filename that is being used. You will have to assign the pipe to a higher FD (e.g. 20) by passing a function to preexec_fn that uses os.dup2() to copy it, and then pass the FD device filename (e.g. /dev/fd/20) as one of the arguments of the call.

def assignfd(fd, handle):
  def assign():
    os.dup2(handle, fd)
  return assign

 ...
p2 = Popen(['cat', '/dev/fd/20'], preexec_fn=assignfd(20, p1.stdout.fileno()))
 ...



回答3:


It's actually possible have it both ways -- using a shell, while passing a list of arguments through unambiguously in a way that doesn't allow them to be shell-parsed.

Use bash explicitly rather than shell=True to ensure that you have support for <(), and use "$@" to refer to the additional argv array elements, like so:

subprocess.Popen(['bash', '-c',
    'extersoftware "$@" --f <(cat fileA_1 fileB_1) <(cat fileA_2 fileB_2)',
    "_",    # this is a dummy passed in as argv[0] of the interpreter
    "--fq", # this is substituted into the shell by the "$@"
])

If you wanted to independently pass in all three arrays -- extra arguments, and the exact filenames to be passed to each cat instance:

BASH_SCRIPT=r'''
declare -a filelist1=( )

filelist1_len=$1; shift
while (( filelist1_len-- > 0 )); do
  filelist1+=( "$1" ); shift
done

filelist2_len=$1; shift
while (( filelist2_len-- > 0 )); do
  filelist2+=( "$1" ); shift
done

extersoftware "$@" --f <(cat "${filelist1[@]}") <(cat "${filelist2[@]}")
'''
subprocess.Popen(['bash', '-c', BASH_SCRIPT, '' +
    [str(len(filelist1))] + filelist1 +
    [str(len(filelist2))] + filelist2 +
    ["--fq"],
])

You could put more interesting logic in the embedded shell script as well, were you so inclined.




回答4:


In this specific case, we may use:

import subprocess
import os

if __name__ == '__main__':
    input_fd1, output_fd1 = os.pipe()
    subprocess.Popen(['cat', 'fileA_1', 'fileB_1'],
     shell=False, stdout=output_fd1)
    os.close(output_fd1)

    input_fd2, output_fd2 = os.pipe();
    subprocess.Popen(['cat', 'fileA_2', 'fileB_2'],
     shell=False, stdout=output_fd2)
    os.close(output_fd2)

    proc = subprocess.Popen(['extersoftware','--fq', '--f',
     '/dev/fd/'+str(input_fd1), '/dev/fd/' + str(input_fd2)], shell=False)

Change log:

Reformatted the code so it should be easier to read now (and hopefully still syntactically correct). It's tested in Python 2.6.6 on Scientific Linux 6.5 and everything looks fine.

Removed unnecessary semicolons.



来源:https://stackoverflow.com/questions/26593229/how-to-execute-cat-filea-fileb-using-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!