Google Composer- How do I install Microsoft SQL Server ODBC drivers on environments

时光毁灭记忆、已成空白 提交于 2020-06-17 03:37:54

问题


I am new to GCP and Airflow and am trying to run my python pipelines via a simple PYODBC connection via python 3. However, I believe I have found what I need to install on the machines [Microsoft doc]https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-2017 , but I am not sure where to go in GCP to run these commands. I have gone down several deep holes looking for answers, but don't know how to solve the problem

Here is the error I keep seeing when I upload the DAG:

Airflow Error

Here is the PYODBC connection:

pyodbc.connect('DRIVER={Microsoft SQL Server};SERVER=servername;DATABASE=dbname;UID=username;PWD=password')

When I open my gcloud shell in environments and run Microsoft downloads it just aborts, when I downloaded SDK and connected to project from local download it auto aborts or doesn't recognize commands from Microsoft. Can anyone give some simple instruction on where to start and what I am doing wrong?


回答1:


Consider that Composer is a Google managed implementation of Apache Airflow hence, expect it to behave differently.

Having this in mind, custom Python dependincies and binary dependencies not available in the Cloud Composer worker image can use the KubernetesPodOperator option.

What this does essentially, is to allow you to create a custom container image with all your requirements, push it into a container image repository (Dockerhub, GCR) and then pull it into your Composer environment, so all of your dependencies are met.

This escalates better as there is no need for you to interact with the machines (this approach is stated in your original question), and it looks easier to just build your container image with whatever you need in there.

Specifically speaking of pyodbc and in this context of dependency installation using Composer, there is a feature request to address this issue, that also outlines a workaround (basically what is mentioned in this answer). You might want to check it out.




回答2:


Cloud Composer currently primarily supports installing PyPI packages written in pure Python. Installing system packages is not fully supported at this time, but there are some workarounds (such as setting LD_LIBRARY_PATH and uploading shared libraries, etc). You're getting aborts because you installed the Python part of the package, but not system dependencies the Python package depends on.

As you read, changes to Airflow workers in Composer are ephemeral (or at least, should be treated as such), but one way of working around this is to install packages using BashOperator before the task that needs the library runs. It's not pretty, but it ensure that dependencies are installed on the worker before the Python code that needs them is called.




回答3:


I was facing the same problem. The first solution which worked for me was building a docker image that would install the drivers and then run the code. Initially I tried to find a way of installing the drivers on the cluster but after many failures I read in documentation that the airflow image in composer is curated by Google and no changes affecting the image are allowable. So here is my docker file:

FROM python:3.7-slim-buster
#FROM gcr.io/data-development-254912/gcp_bi_baseimage 
#FROM gcp_bi_baseimage
LABEL maintainer = " " 
ENV APP_HOME /app 
WORKDIR $APP_HOME
COPY / ./
# install nano 
RUN apt-get update \
    && apt-get install --yes --no-install-recommends \
        apt-utils \
        apt-transport-https \
        curl \
        gnupg \
        unixodbc-dev \ 
        gcc \
        g++ \ 
        nano \
    && curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
    && curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list \
    && apt-get update \
    && ACCEPT_EULA=Y apt-get install --yes --no-install-recommends msodbcsql17 \
    && apt-get install libgssapi-krb5-2 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf /tmp/*
 RUN pip install -r requirements.txt
 CMD ["python","app.py"]

requirements.txt:

pyodbc==4.0.28
google-cloud-bigquery==1.24.0    
google-cloud-storage==1.26.0

You should be good from this point on.

Since then I managed to set up an Airflow named connection to our sql server and am using mssql_operator or mssql_hook. I had worked with a cloud engineer to set up the networking just right. What I found is that the named connection is much easier to use, yet kubernetesPodOperator is still much more reliable.



来源:https://stackoverflow.com/questions/60346440/google-composer-how-do-i-install-microsoft-sql-server-odbc-drivers-on-environme

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!