Hang in Python script using SQLAlchemy and multiprocessing

后端 未结 4 1835
北海茫月
北海茫月 2021-01-05 06:48

Consider the following Python script, which uses SQLAlchemy and the Python multiprocessing module. This is with Python 2.6.6-8+b1(default) and SQLAlchemy 0.6.3-3 (default)

相关标签:
4条回答
  • 2021-01-05 07:22

    (This is in answer to Faheem Mitha's question in a comment about how to use copy_reg to work around the broken exception classes.)

    The __init__() methods of SQLAlchemy's exception classes seem to call their base class's __init__() methods, but with different arguments. This mucks up pickling.

    To customise the pickling of sqlalchemy's exception classes you can use copy_reg to register your own reduce functions for those classes.

    A reduce function takes an argument obj and returns a pair (callable_obj, args) such that a copy of obj can be created by doing callable_obj(*args). For example

    class StatementError(SQLAlchemyError):
        def __init__(self, message, statement, params, orig):
            SQLAlchemyError.__init__(self, message)
            self.statement = statement
            self.params = params
            self.orig = orig
        ...
    

    can be "fixed" by doing

    import copy_reg, sqlalchemy.exc
    
    def reduce_StatementError(e):
        message = e.args[0]
        args = (message, e.statement, e.params, e.orig)
        return (type(e), args)
    
    copy_reg.pickle(sqlalchemy.exc.StatementError, reduce_StatementError)
    

    There are several other classes in sqlalchemy.exc which need to be fixed similarly. But hopefully you get the idea.


    On second thoughts, rather than fixing each class individually, you can probably just monkey patch the __reduce__() method of the base exception class:

    import sqlalchemy.exc
    
    def rebuild_exc(cls, args, dic):
        e = Exception.__new__(cls)
        e.args = args
        e.__dict__.update(dic)
        return e
    
    def __reduce__(e):
        return (rebuild_exc, (type(e), e.args, e.__dict__))
    
    sqlalchemy.exc.SQLAlchemyError.__reduce__ = __reduce__
    
    0 讨论(0)
  • 2021-01-05 07:27

    I don't know about the cause of the original exception. However, multiprocessing's problems with "bad" exceptions is really down to how pickling works. I think the sqlachemy exception class is broken.

    If an exception class has an __init__() method which does not call BaseException.__init__() (directly or indirectly) then self.args probably will not be set properly. BaseException.__reduce__() (which is used by the pickle protocol) assumes that a copy of an exception e can be recreated by just doing

    type(e)(*e.args)
    

    For example

    >>> e = ValueError("bad value")
    >>> e
    ValueError('bad value',)
    >>> type(e)(*e.args)
    ValueError('bad value',)
    

    If this invariant does not hold then pickling/unpickling will fail. So instances of

    class BadExc(Exception):
        def __init__(self, a):
            '''Non-optional param in the constructor.'''
            self.a = a
    

    can be pickled, but the result cannot be unpickled:

    >>> from cPickle import loads, dumps
    >>> class BadExc(Exception):
    ...     def __init__(self, a):
    ...         '''Non-optional param in the constructor.'''
    ...         self.a = a
    ...
    >>> loads(dumps(BadExc(1)))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: ('__init__() takes exactly 2 arguments (1 given)', <class '__main__.BadExc'>, ())
    

    But instances of

    class GoodExc1(Exception):
        def __init__(self, a):
            '''Non-optional param in the constructor.'''
            Exception.__init__(self, a)
            self.a = a
    

    or

    class GoodExc2(Exception):
        def __init__(self, a):
            '''Non-optional param in the constructor.'''
            self.args = (a,)
            self.a = a
    

    can be successfully pickled/unpickled.

    So you should ask the developers of sqlalchemy to fix their exception classes. In the mean time you can probably use copy_reg.pickle() to override BaseException.__reduce__() for the troublesome classes.

    0 讨论(0)
  • 2021-01-05 07:36

    The TypeError: ('__init__() takes at least 4 arguments (2 given) error isn't related to the sql you're trying to execute, it has to do with how you're using SqlAlchemy's API.

    The trouble is that you're trying to call execute on the session class rather than an instance of that session.

    Try this:

    session = Session()
    session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
    session.commit()
    

    From the docs:

    It is intended that the sessionmaker() function be called within the global scope of an application, and the returned class be made available to the rest of the application as the single class used to instantiate sessions.

    So Session = sessionmaker() returns a new session class and session = Session() returns an instance of that class which you can then call execute on.

    0 讨论(0)
  • 2021-01-05 07:41

    I believe the TypeError comes from multiprocessing's get.

    I've stripped out all the DB code from your script. Take a look at this:

    import multiprocessing
    import sqlalchemy.exc
    
    def do(kwargs):
        i = kwargs['i']
        print i
        raise sqlalchemy.exc.ProgrammingError("", {}, None)
        return i
    
    
    pool = multiprocessing.Pool(processes=5)               # start 4 worker processes
    results = []
    arglist = []
    for i in range(10):
        arglist.append({'i':i})
    r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
    
    # Use get or wait?
    # r.get()
    r.wait()
    
    pool.close()
    pool.join()
    print results
    

    Using r.wait returns the result expected, but using r.get raises TypeError. As describe in python's docs, use r.wait after a map_async.

    Edit: I have to amend my previous answer. I now believe the TypeError comes from SQLAlchemy. I've amended my script to reproduce the error.

    Edit 2: It looks like the problem is that multiprocessing.pool does not play well if any worker raises an Exception whose constructor requires a parameter (see also here).

    I've amended my script to highlight this.

    import multiprocessing
    
    class BadExc(Exception):
        def __init__(self, a):
            '''Non-optional param in the constructor.'''
            self.a = a
    
    class GoodExc(Exception):
        def __init__(self, a=None):
            '''Optional param in the constructor.'''
            self.a = a
    
    def do(kwargs):
        i = kwargs['i']
        print i
        raise BadExc('a')
        # raise GoodExc('a')
        return i
    
    pool = multiprocessing.Pool(processes=5)
    results = []
    arglist = []
    for i in range(10):
        arglist.append({'i':i})
    r = pool.map_async(do, arglist, callback=results.append)
    try:
        # set a timeout in order to be able to catch C-c
        r.get(1e100)
    except KeyboardInterrupt:
        pass
    print results
    

    In your case, given that your code raises an SQLAlchemy exception, the only solution I can think of is to catch all the exceptions in the do function and re-raise a normal Exception instead. Something like this:

    import multiprocessing
    
    class BadExc(Exception):
        def __init__(self, a):
            '''Non-optional param in the constructor.'''
            self.a = a
    
    def do(kwargs):
        try:
            i = kwargs['i']
            print i
            raise BadExc('a')
            return i
        except Exception as e:
            raise Exception(repr(e))
    
    pool = multiprocessing.Pool(processes=5)
    results = []
    arglist = []
    for i in range(10):
        arglist.append({'i':i})
    r = pool.map_async(do, arglist, callback=results.append)
    try:
        # set a timeout in order to be able to catch C-c
        r.get(1e100)
    except KeyboardInterrupt:
        pass
    print results
    

    Edit 3: so, it seems to be a bug with Python, but proper exceptions in SQLAlchemy would workaround it: hence, I've raised the issue with SQLAlchemy, too.

    As a workaround the problem, I think the solution at the end of Edit 2 would do (wrapping callbacks in try-except and re-raise).

    0 讨论(0)
提交回复
热议问题