How to run airflow with CeleryExecutor on a custom docker image

依然范特西╮ 提交于 2021-01-29 06:32:13

问题


I am adding airflow to a web application that manually adds a directory containing business logic to the PYTHON_PATH env var, as well as does additional system-level setup that I want to be consistent across all servers in my cluster. I've been successfully running celery for this application with RMQ as the broker and redis as the task results backend for awhile, and have prior experience running Airflow with LocalExecutor.

Instead of using Pukel's image, I have a an entry point for a base backend image that runs a different service based on the SERVICE env var. That looks like this:

if [ $SERVICE == "api" ]; then
    # upgrade to the data model
    flask db upgrade

    # start the web application
    python wsgi.py
fi

if [ $SERVICE == "worker" ]; then
    celery -A tasks.celery.celery worker --loglevel=info --uid=nobody
fi

if [ $SERVICE == "scheduler" ]; then
    celery -A tasks.celery.celery beat --loglevel=info
fi

if [ $SERVICE == "airflow" ]; then
    airflow initdb
    airflow scheduler
    airflow webserver

I have an .env file that I build the containers with the defines my airflow parameters:

AIRFLOW_HOME=/home/backend/airflow
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql+pymysql://${MYSQL_USER}:${MYSQL_ROOT_PASSWORD}@${MYSQL_HOST}:${MYSQL_PORT}/airflow?charset=utf8mb4
AIRFLOW__CELERY__BROKER_URL=amqp://${RABBITMQ_DEFAULT_USER}:${RABBITMQ_DEFAULT_PASS}@${RABBITMQ_HOST}:5672
AIRFLOW__CELERY__RESULT_BACKEND=redis://${REDIS_HOST}

With how my entrypoint is setup currently, it doesn't make it to the webserver. Instead, it runs that scheduler in the foreground with invoking the web server. I can change this to

airflow initdb
airflow scheduler -D
airflow webserver

Now the webserver runs, but it isn't aware of the scheduler, which is now running as a daemon:

Airflow does, however, know that I'm using a CeleryExecutor and looks for the dags in the right place:

airflow      | [2020-07-29 21:48:35,006] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).
airflow      | [2020-07-29 21:48:35,010] {__init__.py:50} INFO - Using executor CeleryExecutor
airflow      | [2020-07-29 21:48:35,010] {dagbag.py:396} INFO - Filling up the DagBag from /home/backend/airflow/dags
airflow      | [2020-07-29 21:48:35,113] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).

I can solve this by going inside the container and manually firing up the scheduler:

The trick seems to be running both processes in the foreground within the container, but I'm stuck on how to do that inside the entrypoint. I've checked out Pukel's entrypoint code, but it's not obvious to me what he's doing. I'm sure that with just a slight tweak this will be off to the races... Thanks in advance for the help. Also, if there's any major anti-pattern that I'm at risk of running into here I'd love to get the feedback so that I can implement airflow properly. This is my first time implementing CeleryExecutor, and there's a decent amount involved.


回答1:


try using nohup. https://en.wikipedia.org/wiki/Nohup

nohup airflow scheduler >scheduler.log &

in your case, you would update your entrypoint as follows:

if [ $SERVICE == "airflow" ]; then
    airflow initdb
    nohup airflow scheduler > scheduler.log &
    nohup airflow webserver
fi


来源:https://stackoverflow.com/questions/63163053/how-to-run-airflow-with-celeryexecutor-on-a-custom-docker-image

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!