Hang in Python script using SQLAlchemy and multiprocessing

后端 未结 4 1836
北海茫月
北海茫月 2021-01-05 06:48

Consider the following Python script, which uses SQLAlchemy and the Python multiprocessing module. This is with Python 2.6.6-8+b1(default) and SQLAlchemy 0.6.3-3 (default)

4条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-05 07:41

    I believe the TypeError comes from multiprocessing's get.

    I've stripped out all the DB code from your script. Take a look at this:

    import multiprocessing
    import sqlalchemy.exc
    
    def do(kwargs):
        i = kwargs['i']
        print i
        raise sqlalchemy.exc.ProgrammingError("", {}, None)
        return i
    
    
    pool = multiprocessing.Pool(processes=5)               # start 4 worker processes
    results = []
    arglist = []
    for i in range(10):
        arglist.append({'i':i})
    r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
    
    # Use get or wait?
    # r.get()
    r.wait()
    
    pool.close()
    pool.join()
    print results
    

    Using r.wait returns the result expected, but using r.get raises TypeError. As describe in python's docs, use r.wait after a map_async.

    Edit: I have to amend my previous answer. I now believe the TypeError comes from SQLAlchemy. I've amended my script to reproduce the error.

    Edit 2: It looks like the problem is that multiprocessing.pool does not play well if any worker raises an Exception whose constructor requires a parameter (see also here).

    I've amended my script to highlight this.

    import multiprocessing
    
    class BadExc(Exception):
        def __init__(self, a):
            '''Non-optional param in the constructor.'''
            self.a = a
    
    class GoodExc(Exception):
        def __init__(self, a=None):
            '''Optional param in the constructor.'''
            self.a = a
    
    def do(kwargs):
        i = kwargs['i']
        print i
        raise BadExc('a')
        # raise GoodExc('a')
        return i
    
    pool = multiprocessing.Pool(processes=5)
    results = []
    arglist = []
    for i in range(10):
        arglist.append({'i':i})
    r = pool.map_async(do, arglist, callback=results.append)
    try:
        # set a timeout in order to be able to catch C-c
        r.get(1e100)
    except KeyboardInterrupt:
        pass
    print results
    

    In your case, given that your code raises an SQLAlchemy exception, the only solution I can think of is to catch all the exceptions in the do function and re-raise a normal Exception instead. Something like this:

    import multiprocessing
    
    class BadExc(Exception):
        def __init__(self, a):
            '''Non-optional param in the constructor.'''
            self.a = a
    
    def do(kwargs):
        try:
            i = kwargs['i']
            print i
            raise BadExc('a')
            return i
        except Exception as e:
            raise Exception(repr(e))
    
    pool = multiprocessing.Pool(processes=5)
    results = []
    arglist = []
    for i in range(10):
        arglist.append({'i':i})
    r = pool.map_async(do, arglist, callback=results.append)
    try:
        # set a timeout in order to be able to catch C-c
        r.get(1e100)
    except KeyboardInterrupt:
        pass
    print results
    

    Edit 3: so, it seems to be a bug with Python, but proper exceptions in SQLAlchemy would workaround it: hence, I've raised the issue with SQLAlchemy, too.

    As a workaround the problem, I think the solution at the end of Edit 2 would do (wrapping callbacks in try-except and re-raise).

提交回复
热议问题