Store and access password using Apache airflow

前端 未结 6 1859
忘了有多久
忘了有多久 2020-12-25 11:43

We are using airflow as a scheduler. I want to invoke a simple bash operator in a DAG. The bash script needs password as an argument to do further processing.

How c

相关标签:
6条回答
  • 2020-12-25 11:57

    In this case I would use a PythonOperator from which you are able to get a Hook on your database connection using hook = PostgresHook(postgres_conn_id=postgres_conn_id). You can then call get_connection on this hook which will give you a Connection object from which you can get the host, login and password for your database connection.

    Finally, use for example subprocess.call(your_script.sh, connection_string) passing the connection details as a parameter.

    This method is a bit convoluted but it does allow you to keep the encryption for database connections in Airflow. Also, you should be able to pull this strategy into a separate Operator class inheriting the base behaviour from PythonOperator but adding the logic for getting the hook and calling the bash script.

    0 讨论(0)
  • 2020-12-25 11:58

    You can store the password in airflow variables, https://airflow.incubator.apache.org/ui.html#variable-view

    1. Create a variable with key&value in UI, for example, mypass:XXX
    2. Import Variable from airflow.models import Variable
    3. MyPass = Variable.get("mypass")
    4. Pass MyPass to your bash script:
    command = """
              echo "{{ params.my_param }}"
              """
    
    
    
    task = BashOperator(
            task_id='templated',
            bash_command=command,
            params={'my_param': MyPass},
            dag=dag)
    
    0 讨论(0)
  • 2020-12-25 12:04

    This is what I've used.

        def add_slack_token(ds, **kwargs):
            """"Add a slack token"""
            session = settings.Session()
    
            new_conn = Connection(conn_id='slack_token')
            new_conn.set_password(SLACK_LEGACY_TOKEN)
    
            if not (session.query(Connection).filter(Connection.conn_id == 
             new_conn.conn_id).first()):
            session.add(new_conn)
            session.commit()
            else:
                msg = '\n\tA connection with `conn_id`={conn_id} already exists\n'
                msg = msg.format(conn_id=new_conn.conn_id)
                print(msg)
    
        dag = DAG(
            'add_connections',
            default_args=default_args,
            schedule_interval="@once")
    
    
        t2 = PythonOperator(
            dag=dag,
            task_id='add_slack_token',
            python_callable=add_slack_token,
            provide_context=True,
        )
    
    0 讨论(0)
  • 2020-12-25 12:06

    Use the GUI in the admin/connections tab.

    The answer that truly works, with persisting the connection in Airflow programatically, works as in the snippet below.

    In the below example myservice represents some external credential cache.

    When using the approach below, you can store your connections that you manage externally inside of airflow. Without having to poll the service from within every dag/task. Instead you can rely on airflow's connection mechanism and you don't have to lose out on the Operators that Airflow exposes either (should your organisation allow this).

    The trick is using airflow.utils.db.merge_conn to handle the setting of your created connection object.

        from airflow.utils.db import provide_session, merge_conn
    
    
    
    
        creds = {"user": myservice.get_user(), "pwd": myservice.get_pwd() 
    
        c = Connection(conn_id=f'your_airflow_connection_id_here',
                       login=creds["user"],
                       host=None)
        c.set_password(creds["pwd"])
        merge_conn(c)
    

    merge_conn is build-in and used by airflow itself to initialise empty connections. However it will not auto-update. for that you will have to use your own helper function.

    from airflow.utils.db import provide_session
    
    @provide_session
    def store_conn(conn, session=None):
        from airflow.models import Connection
        if session.query(Connection).filter(Connection.conn_id == conn.conn_id).first():
            logging.info("Connection object already exists, attempting to remove it...")
            session.delete(session.query(Connection).filter(Connection.conn_id == conn.conn_id).first())
    
        session.add(conn)
        session.commit()
    
    0 讨论(0)
  • 2020-12-25 12:10
    from airflow.hooks.base_hook import BaseHook
    conn = BaseHook.get_connection('bigquery_connection')
    print(conn.get_extra())
    

    These conn.get_extra() will give you JSON of the settings stored in the connection.

    0 讨论(0)
  • 2020-12-25 12:14

    You can store the password in a Hook - this will be encrypted so long as you have setup your fernet key.

    Here is how you can create a connection.

    from airflow.models import Connection
    def create_conn(username, password, host=None):
        new_conn = Connection(conn_id=f'{username}_connection',
                                      login=username,
                                      host=host if host else None)
        new_conn.set_password(password)
    

    Then, this password is encryted in the db you setup.

    To access this password:

    from airflow.hooks.base_hook import BaseHook
    
     connection = BaseHook.get_connection("username_connection")
     password = connection.password # This is a getter that returns the unencrypted password.
    

    EDIT:

    There is an easier way to create a connection via the UI:

    Then:

    0 讨论(0)
提交回复
热议问题