“ResourceClosedError: The transaction is closed” error with celery beat and sqlalchemy + pyramid app

问题

I have a pyramid app called mainsite.

The site works in a pretty asynchronous manner mostly through threads being launched from the view to carry out the backend operations.

It connects to mysql with sqlalchemy and uses ZopeTransactionExtension for session management.

So far the application has been running great.

I need to run periodic jobs on it and it needs to use some of the same asynchronous functions that are being launched from the view.

I used apscheduler but ran into issues with that. So I thought of using celery beat as a separate process that treats mainapp as a library and imports the functions to be used.

My celery config looks like this:

from datetime import timedelta
from api.apiconst import RERUN_CHECK_INTERVAL, AUTOMATION_CHECK_INTERVAL, \
    AUTH_DELETE_TIME

BROKER_URL = 'sqla+mysql://em:em@localhost/edgem'
CELERY_RESULT_BACKEND = "database"
CELERY_RESULT_DBURI = 'mysql://em:em@localhost/edgem'

CELERYBEAT_SCHEDULE = {
    'rerun': {
        'task': 'tasks.rerun_scheduler',
        'schedule': timedelta(seconds=RERUN_CHECK_INTERVAL)
    },
    'automate': {
        'task': 'tasks.automation_scheduler',
        'schedule': timedelta(seconds=20)
    },
    'remove-tokens': {
        'task': 'tasks.token_remover_scheduler',
        'schedule': timedelta(seconds=2 * 24 * 3600 )
    },
}

CELERY_TIMEZONE = 'UTC'

The tasks.py is

from celery import Celery
celery = Celery('tasks')
celery.config_from_object('celeryconfig')


@celery.task
def rerun_scheduler():
    from mainsite.task import check_update_rerun_tasks
    check_update_rerun_tasks()


@celery.task
def automation_scheduler():
    from mainsite.task import automate
    automate()


@celery.task
def token_remover_scheduler():
    from mainsite.auth_service import delete_old_tokens
    delete_old_tokens()

keep in mind that all the above functions immediately return but launch threads if required

The threads save objects into db by doing transaction.commit() after session.add(object).

The problem is that the whole things works like a gem only for about 30 minutes. After that ResourceClosedError: The transaction is closed errors starts happening wherever there is a transaction.commit(). I am not sure what is the problem and I need help troubleshooting.

The reason I do import inside the tasks was to get rid of this error. Thought importing every time task needed to be run was a good idea and I may get a new transaction each time, but looks like that is not the case.

回答1:

In my experience trying to reuse a session configured to be used with Pyramid (with ZopeTransactionExtension etc.) with a Celery worker results in a terrible hard-to-debug mess.

ZopeTransactionExtension binds SQLAlchemy session to Pyramid's request-response cycle - a transaction is started and committed or rolled back automatically, you're generally not supposed to use transaction.commit() within your code - if everything is ok ZTE will commit everything, if your code raises and exception your transaction will be rolled back.

With Celery you need to manage SQLAlchemy sessions manually, which ZTE prevents you from doing, so you need to configure your DBSession differently.

Something simple like this would work:

DBSession = None

def set_dbsession(session):
    global DBSession
    if DBSession is not None:
        raise AttributeError("DBSession has been already set to %s!" % DBSession)

    DBSession = session

And then from Pyramid startup code you do

def main(global_config, **settings):
    ...
    set_dbsession(scoped_session(sessionmaker(extension=ZopeTransactionExtension())))

With Celery it's a bit trickier - I ended up creating a custom start script for Celery, in which I configure the session.

In setup.py of the worker egg:

  entry_points="""
  # -*- Entry points: -*-
  [console_scripts]
  custom_celery = worker.celeryd:start_celery
  custom_celerybeat = worker.celeryd:start_celerybeat
  """,
  )

in worker/celeryd.py:

def initialize_async_session(db_string, db_echo):

    import sqlalchemy as sa
    from db import Base, set_dbsession

    session = sa.orm.scoped_session(sa.orm.sessionmaker(autoflush=True, autocommit=True))
    engine = sa.create_engine(db_string, echo=db_echo)
    session.configure(bind=engine)

    set_dbsession(session)
    Base.metadata.bind = engine


def start_celery():
    initialize_async_session(DB_STRING, DB_ECHO)
    import celery.bin.celeryd
    celery.bin.celeryd.main()

The general approach you're using with "threads being launched from the view to carry out the backend operations" feels a bit dangerous to me if you ever plan to deploy the application to a production server - a web server often recycles, kills or creates new "workers" so generally there are no guarantees each particular process would survive beyond the current request-response cycle. I never tried doing this though, so maybe you'll be ok :)

来源：https://stackoverflow.com/questions/16338665/resourceclosederror-the-transaction-is-closed-error-with-celery-beat-and-sqla

标签

sqlalchemy

celery

pyramid