问题
I'm using Airflow version 1.9 and there is a bug in their software that you can read about here on my previous Stackoverflow post, as well as here on another one of my Stackoverflow posts, and here on Airflow's Github where the bug is reported and discussed.
Long story short there are a few locations in Airflow's code where it needs to get the IP address of the server. They accomplish this by running this command:
socket.getfqdn()
The problem is that on Amazon EC2-Instances (Amazon Linux 1) this command doesn't return the IP address rather it returns the hostname like this:
IP-1-2-3-4
Where as it needs the IP address like this:
1.2.3.4
To get this IP value I found from here that I can use this command:
socket.gethostbyname(socket.gethostname())
I've tested the command out in a Python shell and it returns the proper value. So I ran a search on the Airflow package to find all occurrences of socket.getfqdn()
and this is what I got back:
[airflow@ip-1-2-3-4 site-packages]$ cd airflow/
[airflow@ip-1-2-3-4 airflow]$ grep -r "fqdn" .
./security/utils.py: fqdn = host
./security/utils.py: if not fqdn or fqdn == '0.0.0.0':
./security/utils.py: fqdn = get_localhost_name()
./security/utils.py: return '%s/%s@%s' % (components[0], fqdn.lower(), components[2])
./security/utils.py: return socket.getfqdn()
./security/utils.py:def get_fqdn(hostname_or_ip=None):
./security/utils.py: fqdn = socket.gethostbyaddr(hostname_or_ip)[0]
./security/utils.py: fqdn = get_localhost_name()
./security/utils.py: fqdn = hostname_or_ip
./security/utils.py: if fqdn == 'localhost':
./security/utils.py: fqdn = get_localhost_name()
./security/utils.py: return fqdn
Binary file ./security/__pycache__/utils.cpython-36.pyc matches
Binary file ./security/__pycache__/kerberos.cpython-36.pyc matches
./security/kerberos.py: principal = configuration.get('kerberos', 'principal').replace("_HOST", socket.getfqdn())
./security/kerberos.py: principal = "%s/%s" % (configuration.get('kerberos', 'principal'), socket.getfqdn())
Binary file ./contrib/auth/backends/__pycache__/kerberos_auth.cpython-36.pyc matches
./contrib/auth/backends/kerberos_auth.py: service_principal = "%s/%s" % (configuration.get('kerberos', 'principal'), utils.get_fqdn())
./www/views.py: 'airflow/circles.html', hostname=socket.getfqdn()), 404
./www/views.py: hostname=socket.getfqdn(),
Binary file ./www/__pycache__/app.cpython-36.pyc matches
Binary file ./www/__pycache__/views.cpython-36.pyc matches
./www/app.py: 'hostname': socket.getfqdn(),
Binary file ./__pycache__/jobs.cpython-36.pyc matches
Binary file ./__pycache__/models.cpython-36.pyc matches
./bin/cli.py: hostname = socket.getfqdn()
Binary file ./bin/__pycache__/cli.cpython-36.pyc matches
./config_templates/default_airflow.cfg:# gets augmented with fqdn
./jobs.py: self.hostname = socket.getfqdn()
./jobs.py: fqdn = socket.getfqdn()
./jobs.py: same_hostname = fqdn == ti.hostname
./jobs.py: "{fqdn}".format(**locals()))
Binary file ./api/auth/backend/__pycache__/kerberos_auth.cpython-36.pyc matches
./api/auth/backend/kerberos_auth.py:from socket import getfqdn
./api/auth/backend/kerberos_auth.py: hostname = getfqdn()
./models.py: self.hostname = socket.getfqdn()
./models.py: self.hostname = socket.getfqdn()
I'm unsure if I should just replace all occurrences of the socket.getfqdn()
command with socket.gethostbyname(socket.gethostname())
or not. For one this would be cumbersome to maintain since I would no longer be using the Airflow package I installed from Pip. I tried upgrading to Airflow version 1.10 but it was very buggy and I couldn't get it up and running. So it seems like for now I'm stuck with Airflow version 1.9 but I need to correct this Airflow bug because it's causing my tasks to sporadically fail.
回答1:
Just replace all occurences of the faulty function call with the one that works. Here are the steps I ran. Make sure you do this for all Airflow servers (Masters and Workers) if you are using an Airflow cluster.
[ec2-user@ip-1-2-3-4 ~]$ cd /usr/local/lib/python3.6/site-packages/airflow
[ec2-user@ip-1-2-3-4 airflow]$ grep -r "socket.getfqdn()" .
./security/utils.py: return socket.getfqdn()
./security/kerberos.py: principal = configuration.get('kerberos', 'principal').replace("_HOST", socket.getfqdn())
./security/kerberos.py: principal = "%s/%s" % (configuration.get('kerberos', 'principal'), socket.getfqdn())
./www/views.py: 'airflow/circles.html', hostname=socket.getfqdn()), 404
./www/views.py: hostname=socket.getfqdn(),
./www/app.py: 'hostname': socket.getfqdn(),
./bin/cli.py: hostname = socket.getfqdn()
./jobs.py: self.hostname = socket.getfqdn()
./jobs.py: fqdn = socket.getfqdn()
./models.py: self.hostname = socket.getfqdn()
./models.py: self.hostname = socket.getfqdn()
[ec2-user@ip-1-2-3-4 airflow]$ sudo find . -type f -exec sed -i 's/socket.getfqdn()/socket.gethostbyname(socket.gethostname())/g' {} +
[ec2-user@ip-1-2-3-4 airflow]$ grep -r "socket.getfqdn()" .
[ec2-user@ip-1-2-3-4 airflow]$ grep -r "socket.gethostbyname(socket.gethostname())" .
./security/utils.py: return socket.gethostbyname(socket.gethostname())
./security/kerberos.py: principal = configuration.get('kerberos', 'principal').replace("_HOST", socket.gethostbyname(socket.gethostname()))
./security/kerberos.py: principal = "%s/%s" % (configuration.get('kerberos', 'principal'), socket.gethostbyname(socket.gethostname()))
./www/views.py: 'airflow/circles.html', hostname=socket.gethostbyname(socket.gethostname())), 404
./www/views.py: hostname=socket.gethostbyname(socket.gethostname()),
./www/app.py: 'hostname': socket.gethostbyname(socket.gethostname()),
./bin/cli.py: hostname = socket.gethostbyname(socket.gethostname())
./jobs.py: self.hostname = socket.gethostbyname(socket.gethostname())
./jobs.py: fqdn = socket.gethostbyname(socket.gethostname())
./models.py: self.hostname = socket.gethostbyname(socket.gethostname())
./models.py: self.hostname = socket.gethostbyname(socket.gethostname())
After making that update simply restart the Airflow Webserver, Scheduler, and Worker processes and you should be all set. Note that when I am cd'ing into the python package for airflow I am using python 3.6 well some of you may be on like 3.7 so your path may have to be adjusted to like /usr/local/lib/python3.7/site-packages/airflow so just cd into /usr/local/lib and see what python folder you have to go into. I don't think airflow goes under this location but sometimes python packages are located here too /usr/local/lib64/python3.6/site-packages so the difference in the path there is that it's lib64 instead of lib. Also, keep in mind that this is fixed in Airflow version 1.10 so you should not need to make these changes anymore in the latest version of Airflow.
来源:https://stackoverflow.com/questions/51898933/airflow-ec2-instance-socket-getfqdn-bug