How to run a celery worker with Django app scalable by AWS Elastic Beanstalk?

前端 未结 3 1756
庸人自扰
庸人自扰 2020-12-02 12:47

How to use Django with AWS Elastic Beanstalk that would also run tasks by celery on main node only?

相关标签:
3条回答
  • 2020-12-02 13:08

    This is how I extended the answer by @smentek to allow for multiple worker instances and a single beat instance - same thing applies where you have to protect your leader. (I still don't have an automated solution for that yet).

    Please note that envvar updates to EB via the EB cli or the web interface are not relflected by celery beat or workers until app server restart has taken place. This caught me off guard once.

    A single celery_configuration.sh file outputs two scripts for supervisord, note that celery-beat has autostart=false, otherwise you end up with many beats after an instance restart:

    # get django environment variables
    celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
    celeryenv=${celeryenv%?}
    
    # create celery beat config script
    celerybeatconf="[program:celeryd-beat]
    ; Set full path to celery program if using virtualenv
    command=/opt/python/run/venv/bin/celery beat -A lexvoco --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.pid
    
    directory=/opt/python/current/app
    user=nobody
    numprocs=1
    stdout_logfile=/var/log/celery-beat.log
    stderr_logfile=/var/log/celery-beat.log
    autostart=false
    autorestart=true
    startsecs=10
    
    ; Need to wait for currently executing tasks to finish at shutdown.
    ; Increase this if you have very long running tasks.
    stopwaitsecs = 10
    
    ; When resorting to send SIGKILL to the program to terminate it
    ; send SIGKILL to its whole process group instead,
    ; taking care of its children as well.
    killasgroup=true
    
    ; if rabbitmq is supervised, set its priority higher
    ; so it starts first
    priority=998
    
    environment=$celeryenv"
    
    # create celery worker config script
    celeryworkerconf="[program:celeryd-worker]
    ; Set full path to celery program if using virtualenv
    command=/opt/python/run/venv/bin/celery worker -A lexvoco --loglevel=INFO
    
    directory=/opt/python/current/app
    user=nobody
    numprocs=1
    stdout_logfile=/var/log/celery-worker.log
    stderr_logfile=/var/log/celery-worker.log
    autostart=true
    autorestart=true
    startsecs=10
    
    ; Need to wait for currently executing tasks to finish at shutdown.
    ; Increase this if you have very long running tasks.
    stopwaitsecs = 600
    
    ; When resorting to send SIGKILL to the program to terminate it
    ; send SIGKILL to its whole process group instead,
    ; taking care of its children as well.
    killasgroup=true
    
    ; if rabbitmq is supervised, set its priority higher
    ; so it starts first
    priority=999
    
    environment=$celeryenv"
    
    # create files for the scripts
    echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf
    echo "$celeryworkerconf" | tee /opt/python/etc/celeryworker.conf
    
    # add configuration script to supervisord conf (if not there already)
    if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
      then
      echo "[include]" | tee -a /opt/python/etc/supervisord.conf
      echo "files: celerybeat.conf celeryworker.conf" | tee -a /opt/python/etc/supervisord.conf
    fi
    
    # reread the supervisord config
    /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reread
    # update supervisord in cache without restarting all services
    /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf update
    

    Then in container_commands we only restart beat on leader:

    container_commands:
      # create the celery configuration file
      01_create_celery_beat_configuration_file:
        command: "cat .ebextensions/files/celery_configuration.sh > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && sed -i 's/\r$//' /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
      # restart celery beat if leader
      02_start_celery_beat:
        command: "/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat"
        leader_only: true
      # restart celery worker
      03_start_celery_worker:
        command: "/usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker"
    
    0 讨论(0)
  • 2020-12-02 13:18

    This is how I set up celery with django on elastic beanstalk with scalability working fine.

    Please keep in mind that 'leader_only' option for container_commands works only on environment rebuild or deployment of the App. If service works long enough, leader node may be removed by Elastic Beanstalk. To deal with that, you may have to apply instance protection for your leader node. Check: http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html#instance-protection-instance

    Add bash script for celery worker and beat configuration.

    Add file root_folder/.ebextensions/files/celery_configuration.txt:

    #!/usr/bin/env bash
    
    # Get django environment variables
    celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g' | sed 's/%/%%/g'`
    celeryenv=${celeryenv%?}
    
    # Create celery configuraiton script
    celeryconf="[program:celeryd-worker]
    ; Set full path to celery program if using virtualenv
    command=/opt/python/run/venv/bin/celery worker -A django_app --loglevel=INFO
    
    directory=/opt/python/current/app
    user=nobody
    numprocs=1
    stdout_logfile=/var/log/celery-worker.log
    stderr_logfile=/var/log/celery-worker.log
    autostart=true
    autorestart=true
    startsecs=10
    
    ; Need to wait for currently executing tasks to finish at shutdown.
    ; Increase this if you have very long running tasks.
    stopwaitsecs = 600
    
    ; When resorting to send SIGKILL to the program to terminate it
    ; send SIGKILL to its whole process group instead,
    ; taking care of its children as well.
    killasgroup=true
    
    ; if rabbitmq is supervised, set its priority higher
    ; so it starts first
    priority=998
    
    environment=$celeryenv
    
    [program:celeryd-beat]
    ; Set full path to celery program if using virtualenv
    command=/opt/python/run/venv/bin/celery beat -A django_app --loglevel=INFO --workdir=/tmp -S django --pidfile /tmp/celerybeat.pid
    
    directory=/opt/python/current/app
    user=nobody
    numprocs=1
    stdout_logfile=/var/log/celery-beat.log
    stderr_logfile=/var/log/celery-beat.log
    autostart=true
    autorestart=true
    startsecs=10
    
    ; Need to wait for currently executing tasks to finish at shutdown.
    ; Increase this if you have very long running tasks.
    stopwaitsecs = 600
    
    ; When resorting to send SIGKILL to the program to terminate it
    ; send SIGKILL to its whole process group instead,
    ; taking care of its children as well.
    killasgroup=true
    
    ; if rabbitmq is supervised, set its priority higher
    ; so it starts first
    priority=998
    
    environment=$celeryenv"
    
    # Create the celery supervisord conf script
    echo "$celeryconf" | tee /opt/python/etc/celery.conf
    
    # Add configuration script to supervisord conf (if not there already)
    if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
      then
      echo "[include]" | tee -a /opt/python/etc/supervisord.conf
      echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
    fi
    
    # Reread the supervisord config
    supervisorctl -c /opt/python/etc/supervisord.conf reread
    
    # Update supervisord in cache without restarting all services
    supervisorctl -c /opt/python/etc/supervisord.conf update
    
    # Start/Restart celeryd through supervisord
    supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-beat
    supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd-worker
    

    Take care about script execution during deployment, but only on main node (leader_only: true). Add file root_folder/.ebextensions/02-python.config:

    container_commands:
      04_celery_tasks:
        command: "cat .ebextensions/files/celery_configuration.txt > /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh && chmod 744 /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
        leader_only: true
      05_celery_tasks_run:
        command: "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh"
        leader_only: true
    
    • Beat is configurable without need of redeployment, with separate django applications: https://pypi.python.org/pypi/django_celery_beat.
    • Storing task results is good idea to: https://pypi.python.org/pypi/django_celery_beat

    File requirements.txt

    celery==4.0.0
    django_celery_beat==1.0.1
    django_celery_results==1.0.1
    pycurl==7.43.0 --global-option="--with-nss"
    

    Configure celery for Amazon SQS broker (Get your desired endpoint from list: http://docs.aws.amazon.com/general/latest/gr/rande.html) root_folder/django_app/settings.py:

    ...
    CELERY_RESULT_BACKEND = 'django-db'
    CELERY_BROKER_URL = 'sqs://%s:%s@' % (aws_access_key_id, aws_secret_access_key)
    # Due to error on lib region N Virginia is used temporarily. please set it on Ireland "eu-west-1" after fix.
    CELERY_BROKER_TRANSPORT_OPTIONS = {
        "region": "eu-west-1",
        'queue_name_prefix': 'django_app-%s-' % os.environ.get('APP_ENV', 'dev'),
        'visibility_timeout': 360,
        'polling_interval': 1
    }
    ...
    

    Celery configuration for django django_app app

    Add file root_folder/django_app/celery.py:

    from __future__ import absolute_import, unicode_literals
    import os
    from celery import Celery
    
    # set the default Django settings module for the 'celery' program.
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_app.settings')
    
    app = Celery('django_app')
    
    # Using a string here means the worker don't have to serialize
    # the configuration object to child processes.
    # - namespace='CELERY' means all celery-related configuration keys
    #   should have a `CELERY_` prefix.
    app.config_from_object('django.conf:settings', namespace='CELERY')
    
    # Load task modules from all registered Django app configs.
    app.autodiscover_tasks()
    

    Modify file root_folder/django_app/__init__.py:

    from __future__ import absolute_import, unicode_literals
    
    # This will make sure the app is always imported when
    # Django starts so that shared_task will use this app.
    from django_app.celery import app as celery_app
    
    __all__ = ['celery_app']
    

    Check also:

    • How do you run a worker with AWS Elastic Beanstalk? (solution without scalability)
    • Pip Requirements.txt --global-option causing installation errors with other packages. "option not recognized" (solution for problems coming from obsolate pip on elastic beanstalk that cannto deal with global options for properly solving pycurl dependency)
    0 讨论(0)
  • 2020-12-02 13:21

    If someone is following smentek's answer and getting the error:

    05_celery_tasks_run: /usr/bin/env bash does not exist.
    

    know that, if you are using Windows, your problem might be that the "celery_configuration.txt" file has WINDOWS EOL when it should have UNIX EOL. If using Notepad++, open the file and click on "Edit > EOL Conversion > Unix (LF)". Save, redeploy, and error is no longer there.

    Also, a couple of warnings for really-amateur people like me:

    • Be sure to include "django_celery_beat" and "django_celery_results" in your "INSTALLED_APPS" in settings.py file.

    • To check celery errors, connect to your instance with "eb ssh" and then "tail -n 40 /var/log/celery-worker.log" and "tail -n 40 /var/log/celery-beat.log" (where "40" refers to the number of lines you want to read from the file, starting from the end).

    Hope this helps someone, it would've saved me some hours!

    0 讨论(0)
提交回复
热议问题