Celery Beat: Limit to single task instance at a time

后端 未结 5 1689
不思量自难忘°
不思量自难忘° 2020-12-31 17:30

I have celery beat and celery (four workers) to do some processing steps in bulk. One of those tasks is roughly along the lines of, \"for each X that hasn\'t had a Y created

相关标签:
5条回答
  • 2020-12-31 18:01

    I took a crack at writing a decorator to use Postgres advisory locking similar to what erydo alluded to in his comment.

    It's not very pretty, but seems to work correctly. This is with SQLAlchemy 0.9.7 under Python 2.7.

    from functools import wraps
    from sqlalchemy import select, func
    
    from my_db_module import Session # SQLAlchemy ORM scoped_session
    
    def pg_locked(key):
        def decorator(f):
            @wraps(f)
            def wrapped(*args, **kw):
                session = db.Session()
                try:
                    acquired, = session.execute(select([func.pg_try_advisory_lock(key)])).fetchone()
                    if acquired:
                        return f(*args, **kw)
                finally:
                    if acquired:
                        session.execute(select([func.pg_advisory_unlock(key)]))
            return wrapped
        return decorator
    
    @app.task
    @pg_locked(0xdeadbeef)
    def singleton_task():
        # only 1x this task can run at a time
        pass
    

    (Would welcome any comments on ways to improve this!)

    0 讨论(0)
  • 2020-12-31 18:02

    The only way to do this is implementing a locking strategy yourself:

    Read under the section here for the reference.

    Like with cron, the tasks may overlap if the first task does not complete before the next. If that is a concern you should use a locking strategy to ensure only one instance can run at a time (see for example Ensuring a task is only executed one at a time).

    0 讨论(0)
  • 2020-12-31 18:02
    from functools import wraps
    from celery import shared_task
    
    
    def skip_if_running(f):
        task_name = f'{f.__module__}.{f.__name__}'
    
        @wraps(f)
        def wrapped(self, *args, **kwargs):
            workers = self.app.control.inspect().active()
    
            for worker, tasks in workers.items():
                for task in tasks:
                    if (task_name == task['name'] and
                            tuple(args) == tuple(task['args']) and
                            kwargs == task['kwargs'] and
                            self.request.id != task['id']):
                        print(f'task {task_name} ({args}, {kwargs}) is running on {worker}, skipping')
    
                        return None
    
            return f(self, *args, **kwargs)
    
        return wrapped
    
    
    @shared_task(bind=True)
    @skip_if_running
    def test_single_task(self):
        pass
    
    
    test_single_task.delay()
    
    0 讨论(0)
  • 2020-12-31 18:04

    A distributed locking system is required, for those Celery beat instances are essentially different processes which might be across different hosts.

    Central coordinate systems such as ZooKeeper and etcd is suitable for implementation of distributed locking system.

    I recommend using etcd, which is lightweight and fast. There are several implementations of lock over etcd, such as:

    python-etcd-lock

    0 讨论(0)
  • 2020-12-31 18:20

    I solved the issue using celery-once which I extended to celery-one.

    Both serve for your issue. It uses Redis to lock a running task. celery-one will also keep track of the task which is locking.

    A very simple usage example for celery beat follows. In the code below, slow_task is scheduled every 1 second, but it's completion time is 5 seconds. Normal celery would schedule the task each second even if it is already running. celery-one would prevent this.

    celery = Celery('test')
    celery.conf.ONE_REDIS_URL = REDIS_URL
    celery.conf.ONE_DEFAULT_TIMEOUT = 60 * 60
    celery.conf.BROKER_URL = REDIS_URL
    celery.conf.CELERY_RESULT_BACKEND = REDIS_URL
    
    from datetime import timedelta
    
    celery.conf.CELERYBEAT_SCHEDULE = {
        'add-every-30-seconds': {
            'task': 'tasks.slow_task',
            'schedule': timedelta(seconds=1),
            'args': (1,)
        },
    }
    
    celery.conf.CELERY_TIMEZONE = 'UTC'
    
    
    @celery.task(base=QueueOne, one_options={'fail': False})
    def slow_task(a):
        print("Running")
        sleep(5)
        return "Done " + str(a)
    
    0 讨论(0)
提交回复
热议问题