Run process after process in a queue using python

问题

I have a queue of 500 processes that I want to run through a python script, I want to run every N processes in parallel.

What my python script does so far: It runs N processes in parallel, waits for all of them to terminate, then runs the next N files.

What I need to do: When one of the N processes is finished, another process from the queue is automatically started, without waiting for the rest of the processes to terminate.

Note: I do not know how much time each process will take, so I can't schedule a process to run at a particular time.

Following is the code that I have. I am currently using subprocess.Popen, but I'm not limited to its use.

for i in range(0, len(queue), N):
    batch = []
    for _ in range(int(jobs)):
        batch.append(queue.pop(0))
    for process in batch:
        p = subprocess.Popen([process])
        ps.append(p)
    for p in ps:
        p.communicate()

回答1:

I believe this should work:

import subprocess
import time


def check_for_done(l):
    for i, p in enumerate(l):
        if p.poll() is not None:
            return True, i
    return False, False


processes = list()
N = 5
queue = list()
for process in queue:
    p = subprocess.Popen(process)
    processes.append(p)
    if len(processes) == N:
        wait = True
        while wait:
            done, num = check_for_done(processes)

            if done:
                processes.pop(num)
                wait = False
            else:
                time.sleep(0.5) # set this so the CPU does not go crazy

So you have an active process list, and the check_for_done function loops through it, the subprocess returns None if it is not finished and it returns a return code if it is. So when something is returned it should be done (without knowing if it was successful or not). Then you remove that process from the list allowing for the loop to add another one.

回答2:

Assuming python3, you could make use of ThreadPoolExecutor from concurrent.futures like,

$ cat run.py
from subprocess import Popen, PIPE
from concurrent.futures import ThreadPoolExecutor


def exec_(cmd):
    proc = Popen(cmd, stdout=PIPE, stderr=PIPE)
    stdout, stderr = proc.communicate()
    #print(stdout, stderr)


def main():
    with ThreadPoolExecutor(max_workers=4) as executor:
        # to demonstrate it will take a batch of 4 jobs at the same time
        cmds = [['sleep', '4'] for i in range(10)]
        start = time.time()
        futures = executor.map(exec_, cmds)
        for future in futures:
            pass
        end = time.time()
        print(f'Took {end-start} seconds')

if __name__ == '__main__':
    main()

This will process 4 tasks at a time, and since the number of tasks are 10, it should only take around 4 + 4 + 4 = 12 seconds

First 4 seconds for the first 4 tasks

Seconds 4 seconds for the second 4 tasks

And the final 4 seconds for the last 2 tasks remaining

Output:

$ python run.py
Took 12.005989074707031 seconds

来源：https://stackoverflow.com/questions/58031373/run-process-after-process-in-a-queue-using-python

标签

python

job-scheduling