问题
I am adding airflow to a web application that manually adds a directory containing business logic to the PYTHON_PATH
env var, as well as does additional system-level setup that I want to be consistent across all servers in my cluster. I've been successfully running celery for this application with RMQ as the broker and redis as the task results backend for awhile, and have prior experience running Airflow with LocalExecutor
.
Instead of using Pukel's image, I have a an entry point for a base backend
image that runs a different service based on the SERVICE
env var. That looks like this:
if [ $SERVICE == "api" ]; then
# upgrade to the data model
flask db upgrade
# start the web application
python wsgi.py
fi
if [ $SERVICE == "worker" ]; then
celery -A tasks.celery.celery worker --loglevel=info --uid=nobody
fi
if [ $SERVICE == "scheduler" ]; then
celery -A tasks.celery.celery beat --loglevel=info
fi
if [ $SERVICE == "airflow" ]; then
airflow initdb
airflow scheduler
airflow webserver
I have an .env
file that I build the containers with the defines my airflow parameters:
AIRFLOW_HOME=/home/backend/airflow
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql+pymysql://${MYSQL_USER}:${MYSQL_ROOT_PASSWORD}@${MYSQL_HOST}:${MYSQL_PORT}/airflow?charset=utf8mb4
AIRFLOW__CELERY__BROKER_URL=amqp://${RABBITMQ_DEFAULT_USER}:${RABBITMQ_DEFAULT_PASS}@${RABBITMQ_HOST}:5672
AIRFLOW__CELERY__RESULT_BACKEND=redis://${REDIS_HOST}
With how my entrypoint is setup currently, it doesn't make it to the webserver
. Instead, it runs that scheduler
in the foreground with invoking the web server. I can change this to
airflow initdb
airflow scheduler -D
airflow webserver
Now the webserver runs, but it isn't aware of the scheduler, which is now running as a daemon:
Airflow does, however, know that I'm using a CeleryExecutor
and looks for the dags in the right place:
airflow | [2020-07-29 21:48:35,006] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).
airflow | [2020-07-29 21:48:35,010] {__init__.py:50} INFO - Using executor CeleryExecutor
airflow | [2020-07-29 21:48:35,010] {dagbag.py:396} INFO - Filling up the DagBag from /home/backend/airflow/dags
airflow | [2020-07-29 21:48:35,113] {default_celery.py:88} WARNING - You have configured a result_backend of redis://redis, it is highly recommended to use an alternative result_backend (i.e. a database).
I can solve this by going inside the container and manually firing up the scheduler:
The trick seems to be running both processes in the foreground within the container, but I'm stuck on how to do that inside the entrypoint. I've checked out Pukel's entrypoint code, but it's not obvious to me what he's doing. I'm sure that with just a slight tweak this will be off to the races... Thanks in advance for the help. Also, if there's any major anti-pattern that I'm at risk of running into here I'd love to get the feedback so that I can implement airflow properly. This is my first time implementing CeleryExecutor
, and there's a decent amount involved.
回答1:
try using nohup. https://en.wikipedia.org/wiki/Nohup
nohup airflow scheduler >scheduler.log &
in your case, you would update your entrypoint as follows:
if [ $SERVICE == "airflow" ]; then
airflow initdb
nohup airflow scheduler > scheduler.log &
nohup airflow webserver
fi
来源:https://stackoverflow.com/questions/63163053/how-to-run-airflow-with-celeryexecutor-on-a-custom-docker-image