问题
I am trying to use posix_spawn instead of fork/exec to get some performance gain. My current project is written in Python, so I used this Python binding. Also I tried some of its forks, and after that I wrote my own posix_spawn binding in Cython (to get rid of some dependencies), but obtained almost the same results.
There is indeed a significant speed-up when I just need to run processes without capturing stdout/stderr. But when I do need it (for my project it is necessary), the posix_spawn call becomes about as slow as fork/exec call. Moreover, it depends on the amount of allocated memory the same way as fork/exec does. It happens even if the process does not actually produce any output - I checked on /bin/true. I still can't explain such behaviour. For fork/exec (via subprocess module) there is no significant difference whether we read process output or not as long as the output is not too large.
Here is my test script (imports and the code for profiling are omitted)
# test.py
def spawn_no_out(args):
command = args[0]
pid = posix_spawn(command, args)
status, rusage = exits(pid)
def spawn(args):
# Prepare pipes to capture stdout and stderr
stdout_read, stdout_write = os.pipe()
stderr_read, stderr_write = os.pipe()
fa = FileActions()
fa.add_dup2(stdout_write, 1)
fa.add_close(stdout_read)
fa.add_dup2(stderr_write, 2)
fa.add_close(stderr_read)
# Spawn the process
command = args[0]
pid = posix_spawn(command, args, file_actions=fa)
# Read and close file descriptors
os.close(stdout_write)
os.close(stderr_write)
status, rusage = exits(pid)
out = os.fdopen(stdout_read)
err = os.fdopen(stderr_read)
return out, err
def fork(args):
return Popen(args, stdout=PIPE, stderr=PIPE).communicate()
def fork_no_out(args):
return subprocess.call(args)
def run_benchmark(func, args, count):
for _ in xrange(count):
func(args)
print "%s: %ds" % (func.__name__, time.time() - start)
def main():
# Reads from stdout the number of process spawns and size of allocated memory
args = ["/bin/true"]
count = int(sys.argv[1]) if len(sys.argv) > 1 else 1000
mem_size = int(sys.argv[2]) if len(sys.argv) > 2 else 10000000
some_allocated_memory = range(mem_size)
for func in [spawn, spawn_no_out, fork, fork_no_out]:
run_benchmark(func, args, count)
if __name__ == "__main__":
main()
Test output without allocating additional memory:
./test.py 10000 1 spawn: 34s 3754834 function calls (3754776 primitive calls) in 33.517 seconds Ordered by: internal time, cumulative time List reduced from 144 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 10000 15.475 0.002 15.475 0.002 {posix.wait4} 10000 5.850 0.001 5.850 0.001 {_cffi__x2c5d2681xf492c09f.posix_spawn} 10000 3.217 0.000 12.750 0.001 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn) 10000 2.242 0.000 33.280 0.003 ./test.py:23(spawn) 660000 1.777 0.000 3.159 0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new) spawn_no_out: 14s 3340013 function calls in 14.631 seconds Ordered by: internal time, cumulative time List reduced from 25 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 10000 7.466 0.001 7.466 0.001 {posix.wait4} 10000 2.012 0.000 2.012 0.000 {_cffi__x2c5d2681xf492c09f.posix_spawn} 10000 1.658 0.000 6.994 0.001 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn) 650000 1.640 0.000 2.919 0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new) 650000 0.496 0.000 0.496 0.000 {_cffi_backend.newp} fork: 40s 840094 function calls in 40.745 seconds Ordered by: internal time, cumulative time List reduced from 53 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 10000 19.460 0.002 19.460 0.002 {posix.read} 10000 6.505 0.001 6.505 0.001 {posix.fork} 10081 4.667 0.000 4.667 0.000 {built-in method poll} 10000 2.773 0.000 30.190 0.003 /usr/lib/python2.7/subprocess.py:1187(_execute_child) 10000 0.814 0.000 32.996 0.003 /usr/lib/python2.7/subprocess.py:650(__init__) fork_no_out: 38s 330013 function calls in 38.488 seconds Ordered by: internal time, cumulative time List reduced from 36 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 10000 18.179 0.002 18.179 0.002 {posix.read} 10000 6.904 0.001 6.904 0.001 {posix.waitpid} 10000 6.613 0.001 6.613 0.001 {posix.fork} 10000 2.633 0.000 28.976 0.003 /usr/lib/python2.7/subprocess.py:1187(_execute_child) 10000 0.880 0.000 30.070 0.003 /usr/lib/python2.7/subprocess.py:650(__init__)
Test output with allocated memory for list of 10000000 integers (had to decrease the number of calls):
./test.py 1000 10000000 spawn: 20s 379834 function calls (379776 primitive calls) in 20.022 seconds Ordered by: internal time, cumulative time List reduced from 144 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 1000 10.022 0.010 10.022 0.010 {posix.wait4} 1000 8.705 0.009 8.705 0.009 {_cffi__x2c5d2681xf492c09f.posix_spawn} 1000 0.334 0.000 9.412 0.009 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn) 1000 0.269 0.000 19.998 0.020 ./test.py:18(spawn) 66000 0.174 0.000 0.318 0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new) spawn_no_out: 1s 334013 function calls in 1.480 seconds Ordered by: internal time, cumulative time List reduced from 25 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 1000 0.755 0.001 0.755 0.001 {posix.wait4} 1000 0.198 0.000 0.198 0.000 {_cffi__x2c5d2681xf492c09f.posix_spawn} 1000 0.171 0.000 0.708 0.001 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn) 65000 0.167 0.000 0.298 0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new) 65000 0.050 0.000 0.050 0.000 {_cffi_backend.newp} fork: 18s 84067 function calls in 18.554 seconds Ordered by: internal time, cumulative time List reduced from 53 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 1000 9.399 0.009 9.399 0.009 {posix.read} 1000 7.815 0.008 7.815 0.008 {posix.fork} 1054 0.414 0.000 0.414 0.000 {built-in method poll} 1000 0.274 0.000 17.626 0.018 /usr/lib/python2.7/subprocess.py:1187(_execute_child) 1000 0.078 0.000 17.871 0.018 /usr/lib/python2.7/subprocess.py:650(__init__) fork_no_out: 18s 33013 function calls in 18.732 seconds Ordered by: internal time, cumulative time List reduced from 36 to 5 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 1000 9.467 0.009 9.467 0.009 {posix.read} 1000 8.020 0.008 8.020 0.008 {posix.fork} 1000 0.603 0.001 0.603 0.001 {posix.waitpid} 1000 0.280 0.000 17.910 0.018 /usr/lib/python2.7/subprocess.py:1187(_execute_child) 1000 0.072 0.000 18.000 0.018 /usr/lib/python2.7/subprocess.py:650(__init__)
Without profiling the results are the same. As we can see, there is a huge difference (1.4s vs 20s!) in performance for the cases when we call posix_spawn with and without capturing process output. There is no additional heavy calls - posix.wait4 just takes more time. What could I have done wrong here? Does someone have an idea why it happens and how to get better performance for posix_spawn?
P.S. Tested on Linux Mint 17 and CentOS 6.5 - same results.
UPDATE: The same performance degradation happens even if we pass empty FileActions object to posix_spawn, without actually reading stdout/stderr:
def spawn(args):
command = args[0]
pid = posix_spawn(command, args, file_actions=FileActions())
status, rusage = exits(pid)
回答1:
Ok, for the future generations - it appeared that if file_actions is set and certain flags are not set, posix_spawn just uses fork: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/posix/spawni.c;h=2d3ae941dd19f0348ed95c0b957c68c3c0e9815d;hb=c758a6861537815c759cba2018a3b1abb1943842#l97
回答2:
from the source code of the function, linked in alexgorin answer, it seems to me this flag rules over the other arguments :
POSIX_SPAWN_USEVFORK
try it and let me know
来源:https://stackoverflow.com/questions/33570947/posix-spawn-performance-when-capturing-process-output