How to pipe many bash commands from python?

问题

Hi I'm trying to call the following command from python:

comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v "#" | sed "s/\t//g"

How could I do the calling when the inputs for the comm command are also piped?

Is there an easy and straight forward way to do it?

I tried the subprocess module:

subprocess.call("comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'")

Without success, it says: OSError: [Errno 2] No such file or directory

Or do I have to create the different calls individually and then pass them using PIPE as it is described in the subprocess documentation:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

回答1:

Process substitution (<()) is bash-only functionality. Thus, you need a shell, but it can't be just any shell (like /bin/sh, as used by shell=True on non-Windows platforms) -- it needs to be bash.

subprocess.call(['bash', '-c', "comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'"])

By the way, if you're going to be going this route with arbitrary filenames, pass them out-of-band (as below: Passing _ as $0, File1.txt as $1, and File2.txt as $2):

subprocess.call(['bash', '-c',
  '''comm -3 <(awk '{print $1}' "$1" | sort | uniq) '''
  '''        <(awk '{print $1}' "$2" | sort | uniq) '''
  '''        | grep -v '#' | tr -d "\t"''',
  '_', "File1.txt", "File2.txt"])

That said, the best-practices approach is indeed to set up the chain yourself. The below is tested with Python 3.6 (note the need for the pass_fds argument to subprocess.Popen to make the file descriptors referred to via /dev/fd/## links available):

awk_filter='''! /#/ && !seen[$1]++ { print $1 }'''

p1 = subprocess.Popen(['awk', awk_filter],
                      stdin=open('File1.txt', 'r'),
                      stdout=subprocess.PIPE)
p2 = subprocess.Popen(['sort', '-u'],
                      stdin=p1.stdout,
                      stdout=subprocess.PIPE)
p3 = subprocess.Popen(['awk', awk_filter],
                      stdin=open('File2.txt', 'r'),
                      stdout=subprocess.PIPE)
p4 = subprocess.Popen(['sort', '-u'],
                      stdin=p3.stdout,
                      stdout=subprocess.PIPE)
p5 = subprocess.Popen(['comm', '-3',
                       ('/dev/fd/%d' % (p2.stdout.fileno(),)),
                       ('/dev/fd/%d' % (p4.stdout.fileno(),))],
                      pass_fds=(p2.stdout.fileno(), p4.stdout.fileno()),
                      stdout=subprocess.PIPE)
p6 = subprocess.Popen(['tr', '-d', '\t'],
                      stdin=p5.stdout,
                      stdout=subprocess.PIPE)
result = p6.communicate()

This is a lot more code, but (assuming that the filenames are parameterized in the real world) it's also safer code -- you aren't vulnerable to bugs like ShellShock that are triggered by the simple act of starting a shell, and don't need to worry about passing variables out-of-band to avoid injection attacks (except in the context of arguments to commands -- like awk -- that are scripting language interpreters themselves).

That said, another thing to think about is just implementing the whole thing in native Python.

lines_1 = set(line.split()[0] for line in open('File1.txt', 'r') if not '#' in line)
lines_2 = set(line.split()[0] for line in open('File2.txt', 'r') if not '#' in line)
not_common = (lines_1 - lines_2) | (lines_2 - lines_1)
for line in sorted(not_common):
  print line

回答2:

Also checkout plumbum. Makes life easier

http://plumbum.readthedocs.io/en/latest/

Pipelining

This may be wrong, but you can try this:

from plumbum.cmd import grep, comm, awk, sort, uniq, sed 
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
chain = comm['-3', _c1(), _c2() ] | grep['-v', '#'] | sed['s/\t//g']
chain()

Let me know if this goes wrong, Will try to fix it.

Edit: As pointed out, I missed the substitution thing, and I think it would have to be explicitly done by redirecting the above command output to a temporary file and then using that file in the argument to comm.

So the above would now actually become:

from plumbum.cmd import grep, comm, awk, sort, uniq, sed 
_c1 = awk['{print $1}', 'File1.txt'] | sort | uniq
_c2 = awk['{print $1}', 'File2.txt'] | sort | uniq
(_c1 > "/tmp/File1.txt")(), (_c2 > "/tmp/File2.txt")()
chain = comm['-3', "/tmp/File1.txt", "/tmp/File2.txt" ] | grep['-v', '#'] | sed['s/\t//g']
chain()

Also, alternatively you can use the method described by @charles by making use of mkfifo.

来源：https://stackoverflow.com/questions/43812939/how-to-pipe-many-bash-commands-from-python

标签

python

bash

subprocess

pipe