Does airflow require mysql?

蓝咒 提交于 2021-01-29 08:02:24

问题


I am trying to upgrade our version of airflow to 1.10.0. When I do, I get an error that complains it cannot connect to mysql:

worker_1     | sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2002, 'Can\'t connect to local MySQL server through socket \'/var/run/mysqld/mysqld.sock\' (2 "No such file or directory")') (Background on this error at: http://sqlalche.me/e/e3q8)

When I try to remove mysql from our systems altogether, I get the following instead:

scheduler_1  | [2018-10-25 17:22:19,399] {{celery_executor.py:113}} ERROR - No module named 'MySQLdb'

Mysql appears in no environment variable we have set, nor does it appear in airflow.cfg. It appears as if this version of airflow requires mysql for some other reason. Is this true?

Update This is similar to the issue raised here, but I'm more interested in why airflow is calling mysql at all.

I should point out also that we do explicitly set the sqlalchemy connection to a postgres database.

AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://airflow:airflow@postgres/airflow

The error is happening when airflow is trying to write the result of a task run (marking something as failure).

Update

This is the dockerfile I use which defines the airflow image. Note no mention of mysql:

# SOURCE: https://github.com/puckel/docker-airflow

FROM python:3.6-jessie

# Never prompts the user for choices on installation/configuration of packages
ENV DEBIAN_FRONTEND noninteractive
ENV TERM linux

# Airflow
ARG AIRFLOW_VERSION=1.10.0
ARG AIRFLOW_HOME=/usr/local/airflow

# Define en_US.
ENV LANGUAGE en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV LC_CTYPE en_US.UTF-8
ENV LC_MESSAGES en_US.UTF-8
ENV PYTHONPATH ${AIRFLOW_HOME}
ENV AIRFLOW_GPL_UNIDECODE yes

COPY ./requirements.txt .

RUN set -ex \
    && buildDeps=' \
        python3-dev \
        libkrb5-dev \
        libsasl2-dev \
        libssl-dev \
        libffi-dev \
        build-essential \
        libblas-dev \
        liblapack-dev \
        libpq-dev \
        git \
    ' \
    && apt-get update -yqq \
    && apt-get upgrade -yqq \
    && apt-get install -yqq --no-install-recommends \
        $buildDeps \
        python3-pip \
        python3-requests \
        apt-utils \
        curl \
        rsync \
        netcat \
        locales \
        vim \
    && sed -i 's/^# en_US.UTF-8 UTF-8$/en_US.UTF-8 UTF-8/g' /etc/locale.gen \
    && locale-gen \
    && update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
    && useradd -ms /bin/bash -d ${AIRFLOW_HOME} airflow \
    && pip install -U pip setuptools wheel \
    && pip install Cython \
    && pip install pytz \
    && pip install pyOpenSSL \
    && pip install ndg-httpsclient \
    && pip install pyasn1 \
    && pip install apache-airflow[crypto,celery,postgres,hive,jdbc]==$AIRFLOW_VERSION \
    && pip install 'celery[redis]>=4.1.1,<4.2.0' \
    && pip install -r requirements.txt \
    && apt-get purge --auto-remove -yqq $buildDeps \
    && apt-get autoremove -yqq --purge \
    && apt-get clean \
    && rm -rf \
        /var/lib/apt/lists/* \
        /tmp/* \
        /var/tmp/* \
        /usr/share/man \
        /usr/share/doc \
        /usr/share/doc-base

COPY script/entrypoint.sh /entrypoint.sh
COPY celery_healthcheck.sh ${AIRFLOW_HOME}
COPY config/airflow.cfg ${AIRFLOW_HOME}/airflow.cfg
COPY dags ${AIRFLOW_HOME}/dags
COPY operators ${AIRFLOW_HOME}/operators
COPY models ${AIRFLOW_HOME}/models
COPY constants.py ${AIRFLOW_HOME}/constants.py
COPY envconsul ${AIRFLOW_HOME}/envconsul
COPY *.hcl ${AIRFLOW_HOME}/

RUN chown -R airflow: ${AIRFLOW_HOME}

EXPOSE 8080 5555 8793

USER airflow
WORKDIR ${AIRFLOW_HOME}

回答1:


Airflow needs some database to work.

By setting AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://airflow:airflow@postgres/airflow you tell it to use the corrsponding PostreSQL database as the metadata database. And it will try to use it.

The weird thing is that it complains about MySQL database in the error messages. My guess is that you used MySQL with the previous version and initialized the Airflow metadata database with airflow initdb using MySQL. Then you removed MySQL and Airflow started complaining.

I would make sure that the PostgerSQL DB is reachable under the connection specified in AIRFLOW__CORE__SQL_ALCHEMY_CONN and run airflow initdb again. Airflow should start using the PostgreSQL DB for its metadata then.

If it does not work and you can live with losing all the metadata a full reset may help:

airflow resetdb
airflow initdb

Also note that Airflow recommends to use psycopg2 for Postgres.




回答2:


Figured it out. Turns out this other env var (AIRFLOW__CELERY__RESULT_BACKEND) was set with a typo. I had it set to AIRFLOW__CELERY__CELERY_RESULT_BACKEND. I'm not clear why that worked in 1.9 and suddenly started throwing this error when updating, but when I fixed the var it now works.




回答3:


I looks like you are using some default connection configuration.
Even if you set variables like sql_alchemy_conn, Airflow will still have values that were set in the Admin -> Connections menu.
Here is how mine looked after a fresh install:

After a correct airflow initdb, setting, the correct values in airflow_db connection using the UI fixed all the "mysql" errors I had.



来源:https://stackoverflow.com/questions/52994980/does-airflow-require-mysql

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!