Cannot access airflow web server via AWS load balancer HTTPS because airflow redirects me to HTTP

泪湿孤枕 提交于 2019-12-22 05:31:53

问题


I have an airflow web server configured at EC2, it listens at port 8080.

I have an AWS ALB(application load balancer) in front of the EC2, listen at https 80 (facing internet) and instance target port is facing http 8080.

I cannot surf https://< airflow link > from browser because the airflow web server redirects me to http : //< airflow link >/admin, which the ALB does not listen at.

If I surf https://< airflow link > /admin/airflow/login?next=%2Fadmin%2F from browser, then I see the login page because this link does not redirect me.

My question is how to change airflow so that when surfing https://< airflow link > , airflow web server will redirect me to https:..., not http://..... so that AWS ALB can process the request.

I tried to change base_url of airflow.cfg from http://localhost:8080 to https://localhost:8080, according to the below answer, but I do not see any difference with my change....

Anyway, how to access https://< airflow link > from ALB?


回答1:


User user389955 own solution is probably the best approach, but for anyone looking for a quick fix (or want a better idea on what is going on), this seems to be the culprit.

In the following file (python distro may differ):

/usr/local/lib/python3.5/dist-packages/gunicorn/config.py

The following section prevents forwarded for headers from anything other than local

class ForwardedAllowIPS(Setting):
    name = "forwarded_allow_ips"
    section = "Server Mechanics"
    cli = ["--forwarded-allow-ips"]
    meta = "STRING"
    validator = validate_string_to_list
    default = os.environ.get("FORWARDED_ALLOW_IPS", "127.0.0.1")
    desc = """\
        Front-end's IPs from which allowed to handle set secure headers.
        (comma separate).

        Set to ``*`` to disable checking of Front-end IPs (useful for setups
        where you don't know in advance the IP address of Front-end, but
        you still trust the environment).

        By default, the value of the ``FORWARDED_ALLOW_IPS`` environment
        variable. If it is not defined, the default is ``"127.0.0.1"``.
        """

Changing from 127.0.0.1 to specific IP's or * if IP's unknown will do the trick.

At this point, I haven't found a way to set this parameter from within airflow config itself. If I find a way, will update my answer.




回答2:


Since they're using Gunicorn - you can configure the forwarded_allow_ips value as an evironment variable instead of having to use an intermediary proxy like Nginx.

In my case I just set FORWARDED_ALLOW_IPS = * and it's working perfectly fine.

In ECS you can set this in the webserver task configuration if you're using one docker image for all the Airflow tasks (webserver, scheduler, worker, etc.).




回答3:


Finally I found a solution myself.

I introduced a nginx reverse proxy between ALB and airflow web server: ie. https request ->ALB:443 ->nginx proxy: 80 ->web server:8080. I make the nginx proxy tell the airflow web server that the original request is https not http by adding a http header "X-Forwarded-Proto https".

The nginx server is co-located with the web server. and I set the config of it as /etc/nginx/sites-enabled/vhost1.conf (see below). Besides, I deletes the /etc/nginx/sites-enabled/default config file.

server {
    listen 80;
    server_name <domain>;
    index index.html index.htm;
    location / {
      proxy_pass_header Authorization;
      proxy_pass http://localhost:8080;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto https;
      proxy_http_version 1.1;
      proxy_redirect off;
      proxy_set_header Connection "";
      proxy_buffering off;
      client_max_body_size 0;
      proxy_read_timeout 36000s;
    }
}



回答4:


I think you have everything working correctly. The redirect you are seeing is expected as the webserver is set to redirect from / to /admin. If you are using curl, you can pass the flag -L / --location to follow redirects and it should bring you to the list of DAGs.

Another good endpoint to test on is https://<airflow domain name>/health (with no trailing slash, or you'll get a 404!). It should return "The server is healthy!".

Be sure you have https:// in the base_url under the webserver section of your airflow config.



来源:https://stackoverflow.com/questions/48413093/cannot-access-airflow-web-server-via-aws-load-balancer-https-because-airflow-red

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!