Python subprocess module much slower than commands (deprecated)

前端 未结 2 967
孤城傲影
孤城傲影 2020-12-29 10:12

So I wrote a script that accesses a bunch of servers using nc on the command line, and originally I was using Python\'s commands module and calling commands.getoutput() and

相关标签:
2条回答
  • 2020-12-29 10:32

    There seems to be at least two separate issues here.

    First, you are improperly using Popen. Here are the problems I see:

    1. Spawning multiple processes with one Popen.
    2. Passing one string in as args instead of splitting args.
    3. Using the shell to pass text to process rather than the builtin communicate method.
    4. Using shell rather than directly spawning processes.

    Here is a corrected version of your code

    from subprocess import PIPE
    
    args = ['nc', '-w', '1', 'server.com', 'port_num']
    p = subprocess.Popen(args, stdin=PIPE, stdout=PIPE)
    output = p.communicate("get file.ext")
    print output[0]
    

    Second, the fact that you suggest it ends faster when manually run than when run through subprocess suggests that the issue here is that you are not passing the correct string to nc. What is probably happening is that the server is waiting for a terminating string to end the connection. If you are not passing this, then the connection probably remains open until it times out.

    Run nc manually, figure out what the terminating string is, then update the string passed to communicate. With these changes it should run much faster.

    0 讨论(0)
  • 2020-12-29 10:40

    I would expect subprocess to be slower than command. Without meaning to suggest that this is the only reason your script is running slowly, you should take a look at the commands source code. There are fewer than 100 lines, and most of the work is delegated to functions from os, many of which are taken straight from c posix libraries (at least in posix systems). Note that commands is unix-only, so it doesn't have to do any extra work to ensure cross-platform compatibility.

    Now take a look at subprocess. There are more than 1500 lines, all pure Python, doing all sorts of checks to ensure consistent cross-platform behavior. Based on this, I would expect subprocess to run slower than commands.

    I timed the two modules, and on something quite basic, subprocess was almost twice as slow as commands.

    >>> %timeit commands.getoutput('echo "foo" | cat')
    100 loops, best of 3: 3.02 ms per loop
    >>> %timeit subprocess.check_output('echo "foo" | cat', shell=True)
    100 loops, best of 3: 5.76 ms per loop
    

    Swiss suggests some good improvements that will help your script's performance. But even after applying them, note that subprocess is still slower.

    >>> %timeit commands.getoutput('echo "foo" | cat')
    100 loops, best of 3: 2.97 ms per loop
    >>> %timeit Popen('cat', stdin=PIPE, stdout=PIPE).communicate('foo')[0]
    100 loops, best of 3: 4.15 ms per loop
    

    Assuming you are performing the above command many times in a row, this will add up, and account for at least some of the performance difference.

    In any case, I am interpreting your question as being about the relative performance of subprocess and command, rather than being about how to speed up your script. For the latter question, Swiss's answer is better.

    0 讨论(0)
提交回复
热议问题