We are using airflow as a scheduler. I want to invoke a simple bash operator in a DAG. The bash script needs password as an argument to do further processing.
How c
In this case I would use a PythonOperator from which you are able to get a Hook
on your database connection using
hook = PostgresHook(postgres_conn_id=postgres_conn_id)
. You can then call get_connection
on this hook which will give you a Connection object from which you can get the host, login and password for your database connection.
Finally, use for example subprocess.call(your_script.sh, connection_string)
passing the connection details as a parameter.
This method is a bit convoluted but it does allow you to keep the encryption for database connections in Airflow. Also, you should be able to pull this strategy into a separate Operator class inheriting the base behaviour from PythonOperator but adding the logic for getting the hook and calling the bash script.
You can store the password in airflow variables, https://airflow.incubator.apache.org/ui.html#variable-view
from airflow.models import Variable
command = """
echo "{{ params.my_param }}"
"""
task = BashOperator(
task_id='templated',
bash_command=command,
params={'my_param': MyPass},
dag=dag)
This is what I've used.
def add_slack_token(ds, **kwargs):
""""Add a slack token"""
session = settings.Session()
new_conn = Connection(conn_id='slack_token')
new_conn.set_password(SLACK_LEGACY_TOKEN)
if not (session.query(Connection).filter(Connection.conn_id ==
new_conn.conn_id).first()):
session.add(new_conn)
session.commit()
else:
msg = '\n\tA connection with `conn_id`={conn_id} already exists\n'
msg = msg.format(conn_id=new_conn.conn_id)
print(msg)
dag = DAG(
'add_connections',
default_args=default_args,
schedule_interval="@once")
t2 = PythonOperator(
dag=dag,
task_id='add_slack_token',
python_callable=add_slack_token,
provide_context=True,
)
Use the GUI in the admin/connections tab.
The answer that truly works, with persisting the connection in Airflow programatically, works as in the snippet below.
In the below example myservice
represents some external credential cache.
When using the approach below, you can store your connections that you manage externally inside of airflow. Without having to poll the service from within every dag/task. Instead you can rely on airflow's connection mechanism and you don't have to lose out on the Operators that Airflow exposes either (should your organisation allow this).
The trick is using airflow.utils.db.merge_conn
to handle the setting of your created connection object.
from airflow.utils.db import provide_session, merge_conn
creds = {"user": myservice.get_user(), "pwd": myservice.get_pwd()
c = Connection(conn_id=f'your_airflow_connection_id_here',
login=creds["user"],
host=None)
c.set_password(creds["pwd"])
merge_conn(c)
merge_conn is build-in and used by airflow itself to initialise empty connections. However it will not auto-update. for that you will have to use your own helper function.
from airflow.utils.db import provide_session
@provide_session
def store_conn(conn, session=None):
from airflow.models import Connection
if session.query(Connection).filter(Connection.conn_id == conn.conn_id).first():
logging.info("Connection object already exists, attempting to remove it...")
session.delete(session.query(Connection).filter(Connection.conn_id == conn.conn_id).first())
session.add(conn)
session.commit()
from airflow.hooks.base_hook import BaseHook
conn = BaseHook.get_connection('bigquery_connection')
print(conn.get_extra())
These conn.get_extra()
will give you JSON of the settings stored in the connection.
You can store the password in a Hook - this will be encrypted so long as you have setup your fernet key.
Here is how you can create a connection.
from airflow.models import Connection
def create_conn(username, password, host=None):
new_conn = Connection(conn_id=f'{username}_connection',
login=username,
host=host if host else None)
new_conn.set_password(password)
Then, this password is encryted in the db you setup.
To access this password:
from airflow.hooks.base_hook import BaseHook
connection = BaseHook.get_connection("username_connection")
password = connection.password # This is a getter that returns the unencrypted password.
EDIT:
There is an easier way to create a connection via the UI:
Then: