Consider the following Python script, which uses SQLAlchemy and the Python multiprocessing module. This is with Python 2.6.6-8+b1(default) and SQLAlchemy 0.6.3-3 (default)
(This is in answer to Faheem Mitha's question in a comment about how to use copy_reg to work around the broken exception classes.)
The __init__()
methods of SQLAlchemy's exception classes seem to call their base class's __init__()
methods, but with different arguments. This mucks up pickling.
To customise the pickling of sqlalchemy's exception classes you can use copy_reg to register your own reduce functions for those classes.
A reduce function takes an argument obj
and returns a pair (callable_obj, args)
such that a copy of obj
can be created by doing callable_obj(*args)
. For example
class StatementError(SQLAlchemyError):
def __init__(self, message, statement, params, orig):
SQLAlchemyError.__init__(self, message)
self.statement = statement
self.params = params
self.orig = orig
...
can be "fixed" by doing
import copy_reg, sqlalchemy.exc
def reduce_StatementError(e):
message = e.args[0]
args = (message, e.statement, e.params, e.orig)
return (type(e), args)
copy_reg.pickle(sqlalchemy.exc.StatementError, reduce_StatementError)
There are several other classes in sqlalchemy.exc
which need to be fixed similarly. But hopefully you get the idea.
On second thoughts, rather than fixing each class individually, you can probably just monkey patch the __reduce__()
method of the base exception class:
import sqlalchemy.exc
def rebuild_exc(cls, args, dic):
e = Exception.__new__(cls)
e.args = args
e.__dict__.update(dic)
return e
def __reduce__(e):
return (rebuild_exc, (type(e), e.args, e.__dict__))
sqlalchemy.exc.SQLAlchemyError.__reduce__ = __reduce__
I don't know about the cause of the original exception. However, multiprocessing's problems with "bad" exceptions is really down to how pickling works. I think the sqlachemy exception class is broken.
If an exception class has an __init__()
method which does not call BaseException.__init__()
(directly or indirectly) then self.args
probably will not be set properly. BaseException.__reduce__()
(which is used by the pickle protocol) assumes that a copy of an exception e
can be recreated by just doing
type(e)(*e.args)
For example
>>> e = ValueError("bad value")
>>> e
ValueError('bad value',)
>>> type(e)(*e.args)
ValueError('bad value',)
If this invariant does not hold then pickling/unpickling will fail. So instances of
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
can be pickled, but the result cannot be unpickled:
>>> from cPickle import loads, dumps
>>> class BadExc(Exception):
... def __init__(self, a):
... '''Non-optional param in the constructor.'''
... self.a = a
...
>>> loads(dumps(BadExc(1)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ('__init__() takes exactly 2 arguments (1 given)', <class '__main__.BadExc'>, ())
But instances of
class GoodExc1(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
Exception.__init__(self, a)
self.a = a
or
class GoodExc2(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.args = (a,)
self.a = a
can be successfully pickled/unpickled.
So you should ask the developers of sqlalchemy to fix their exception classes. In the mean time you can probably use copy_reg.pickle()
to override BaseException.__reduce__()
for the troublesome classes.
The TypeError: ('__init__() takes at least 4 arguments (2 given)
error isn't related to the sql you're trying to execute, it has to do with how you're using SqlAlchemy's API.
The trouble is that you're trying to call execute
on the session class rather than an instance of that session.
Try this:
session = Session()
session.execute("COMMIT; BEGIN; TRUNCATE foo%s; COMMIT;")
session.commit()
From the docs:
It is intended that the sessionmaker() function be called within the global scope of an application, and the returned class be made available to the rest of the application as the single class used to instantiate sessions.
So Session = sessionmaker()
returns a new session class and session = Session()
returns an instance of that class which you can then call execute
on.
I believe the TypeError
comes from multiprocessing
's get
.
I've stripped out all the DB code from your script. Take a look at this:
import multiprocessing
import sqlalchemy.exc
def do(kwargs):
i = kwargs['i']
print i
raise sqlalchemy.exc.ProgrammingError("", {}, None)
return i
pool = multiprocessing.Pool(processes=5) # start 4 worker processes
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append) # evaluate "f(10)" asynchronously
# Use get or wait?
# r.get()
r.wait()
pool.close()
pool.join()
print results
Using r.wait
returns the result expected, but using r.get
raises TypeError
. As describe in python's docs, use r.wait
after a map_async
.
Edit: I have to amend my previous answer. I now believe the TypeError
comes from SQLAlchemy. I've amended my script to reproduce the error.
Edit 2: It looks like the problem is that multiprocessing.pool
does not play well if any worker raises an Exception whose constructor requires a parameter (see also here).
I've amended my script to highlight this.
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
class GoodExc(Exception):
def __init__(self, a=None):
'''Optional param in the constructor.'''
self.a = a
def do(kwargs):
i = kwargs['i']
print i
raise BadExc('a')
# raise GoodExc('a')
return i
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
In your case, given that your code raises an SQLAlchemy exception, the only solution I can think of is to catch all the exceptions in the do
function and re-raise a normal Exception
instead. Something like this:
import multiprocessing
class BadExc(Exception):
def __init__(self, a):
'''Non-optional param in the constructor.'''
self.a = a
def do(kwargs):
try:
i = kwargs['i']
print i
raise BadExc('a')
return i
except Exception as e:
raise Exception(repr(e))
pool = multiprocessing.Pool(processes=5)
results = []
arglist = []
for i in range(10):
arglist.append({'i':i})
r = pool.map_async(do, arglist, callback=results.append)
try:
# set a timeout in order to be able to catch C-c
r.get(1e100)
except KeyboardInterrupt:
pass
print results
Edit 3: so, it seems to be a bug with Python, but proper exceptions in SQLAlchemy would workaround it: hence, I've raised the issue with SQLAlchemy, too.
As a workaround the problem, I think the solution at the end of Edit 2 would do (wrapping callbacks in try-except and re-raise).