Python multiprocessing and handling exceptions in workers

后端 未结 1 457
自闭症患者
自闭症患者 2021-02-05 16:17

I use python multiprocessing library for an algorithm in which I have many workers processing certain data and returning result to the parent process. I use multiprocessing.Queu

1条回答
  •  滥情空心
    2021-02-05 16:43

    I changed your code slightly to make it work (see explanation below).

    import multiprocessing as mp
    import random
    
    workers_count = 5
    # Probability of failure, change to simulate failures
    fail_init_p = 0.5
    fail_job_p = 0.4
    
    
    #========= Worker =========
    def do_work(job_state, arg):
        if random.random() < fail_job_p:
            raise Exception("Job failed")
        return "job %d processed %d" % (job_state, arg)
    
    def init(args):
        if random.random() < fail_init_p:
            raise Exception("Worker init failed")
        return args
    
    def worker_function(args, jobs_queue, result_queue):
        # INIT
        # What to do when init() fails?
        try:
            state = init(args)
        except:
            print "!Worker %d init fail" % args
            result_queue.put('init failed')
            return
        # DO WORK
        # Process data in the jobs queue
        for job in iter(jobs_queue.get, None):
            try:
                # Can throw an exception!
                result = do_work(state, job)
                result_queue.put(result)
            except:
                print "!Job %d failed, skip..." % job
                result_queue.put('job failed')
    
    
    #========= Parent =========
    jobs = mp.Queue()
    results = mp.Queue()
    for i in range(workers_count):
        mp.Process(target=worker_function, args=(i, jobs, results)).start()
    
    # Populate jobs queue
    results_to_expect = 0
    for j in range(30):
        jobs.put(j)
        results_to_expect += 1
    
    init_failures = 0
    job_failures = 0
    successes = 0
    while job_failures + successes < 30 and init_failures < workers_count:
        result = results.get()
        init_failures += int(result == 'init failed')
        job_failures += int(result == 'job failed')
        successes += int(result != 'init failed' and result != 'job failed')
        #print init_failures, job_failures, successes
    
    for ii in range(workers_count):
        jobs.put(None)
    

    My changes:

    1. Changed jobs to be just a normal Queue (instead of JoinableQueue).
    2. Workers now communicate back special results strings "init failed" and "job failed".
    3. The master process monitors for the said special results so long as specific conditions are in effect.
    4. In the end, put "stop" requests (i.e. None jobs) for however many workers you have, regardless. Note that not all of these may be pulled from the queue (in case the worker failed to initalize).

    By the way, your original code was nice and easy to work with. The random probabilities bit is pretty cool.

    0 讨论(0)
提交回复
热议问题