问题
I have a queue of 500 processes that I want to run through a python script, I want to run every N processes in parallel.
What my python script does so far: It runs N processes in parallel, waits for all of them to terminate, then runs the next N files.
What I need to do: When one of the N processes is finished, another process from the queue is automatically started, without waiting for the rest of the processes to terminate.
Note: I do not know how much time each process will take, so I can't schedule a process to run at a particular time.
Following is the code that I have. I am currently using subprocess.Popen, but I'm not limited to its use.
for i in range(0, len(queue), N):
batch = []
for _ in range(int(jobs)):
batch.append(queue.pop(0))
for process in batch:
p = subprocess.Popen([process])
ps.append(p)
for p in ps:
p.communicate()
回答1:
I believe this should work:
import subprocess
import time
def check_for_done(l):
for i, p in enumerate(l):
if p.poll() is not None:
return True, i
return False, False
processes = list()
N = 5
queue = list()
for process in queue:
p = subprocess.Popen(process)
processes.append(p)
if len(processes) == N:
wait = True
while wait:
done, num = check_for_done(processes)
if done:
processes.pop(num)
wait = False
else:
time.sleep(0.5) # set this so the CPU does not go crazy
So you have an active process list, and the check_for_done function loops through it, the subprocess returns None if it is not finished and it returns a return code if it is. So when something is returned it should be done (without knowing if it was successful or not). Then you remove that process from the list allowing for the loop to add another one.
回答2:
Assuming python3, you could make use of ThreadPoolExecutor
from concurrent.futures
like,
$ cat run.py
from subprocess import Popen, PIPE
from concurrent.futures import ThreadPoolExecutor
def exec_(cmd):
proc = Popen(cmd, stdout=PIPE, stderr=PIPE)
stdout, stderr = proc.communicate()
#print(stdout, stderr)
def main():
with ThreadPoolExecutor(max_workers=4) as executor:
# to demonstrate it will take a batch of 4 jobs at the same time
cmds = [['sleep', '4'] for i in range(10)]
start = time.time()
futures = executor.map(exec_, cmds)
for future in futures:
pass
end = time.time()
print(f'Took {end-start} seconds')
if __name__ == '__main__':
main()
This will process 4 tasks at a time, and since the number of tasks are 10, it should only take around 4 + 4 + 4 = 12 seconds
First 4 seconds for the first 4 tasks
Seconds 4 seconds for the second 4 tasks
And the final 4 seconds for the last 2 tasks remaining
Output:
$ python run.py
Took 12.005989074707031 seconds
来源:https://stackoverflow.com/questions/58031373/run-process-after-process-in-a-queue-using-python