Airflow using template files for PythonOperator

后端 未结 4 1128
遥遥无期
遥遥无期 2021-01-31 04:49

The method of getting a BashOperator or SqlOperator to pick up an external file for its template is somewhat clearly documented, but looking at the Pyt

相关标签:
4条回答
  • 2021-01-31 05:18

    As of Airflow 1.8, the way the PythonOperator replaces its template_ext field in __init__ doesn't work. Tasks only check template_ext on the __class__. To create a PythonOperator that picks up SQL template files you only need to do the following:

    class SQLTemplatedPythonOperator(PythonOperator):
        template_ext = ('.sql',)
    

    And then to access the SQL from your task when it runs:

    SQLTemplatedPythonOperator(
        templates_dict={'query': 'my_template.sql'},
        params={'my_var': 'my_value'},
        python_callable=my_func,
        provide_context=True,
    )
    
    def my_func(**context):
        context['templates_dict']['query']
    
    0 讨论(0)
  • 2021-01-31 05:32

    Recently I came across the same issue and finally solved it. @Ardan 's solution is correct but just want to repeat with a more complete answer with some details in how Airflow works for the newcomers.

    Of course you first need one of this:

    from airflow.operators.python_operator import PythonOperator
    
    class SQLTemplatedPythonOperator(PythonOperator):
    
        # somehow ('.sql',) doesn't work but tuple of two works...
        template_ext = ('.sql','.abcdefg')
    

    Assuming you have a sql template file like below:

    # stored at path: $AIRFLOW_HOME/sql/some.sql
    select {{some_params}} from my_table;
    

    First make sure you add your folder to the search path in your dag params.

    Do not pass template_searchpath to args and then pass args to DAG!!!! It doesn't work.

    dag = DAG(
        dag_id= "some_name",
        default_args=args,
        schedule_interval="@once",
        template_searchpath='/Users/your_name/some_path/airflow_home/sql'
    )
    

    Then your operator call will be

    SQLTemplatedPythonOperator(
            templates_dict={'query': 'some.sql'},
            op_kwargs={"args_directly_passed_to_your_function": "some_value"},
            task_id='dummy',
            params={"some_params":"some_value"},
            python_callable=your_func,
            provide_context=True,
            dag=dag,
        )
    

    Your function will be:

    def your_func(args_directly_passed_to_your_function=None):
        query = context['templates_dict']['query']
        dome_some_thing(query)
    

    Some explanations:

    1. Airflow uses values from the context to render your template. To manually add it to the context, you can use the params field like above.

    2. PythonOperator does not take template file extension from the template_ext field any more like @Ardan mentioned. The source code is here. It only takes extension from self.__class__.template_ext.

    3. Airflow loops through the template_dict field and if value.endswith(file_extension) == True, then it renders the template.

    0 讨论(0)
  • 2021-01-31 05:36

    I don't think this is really possible. But the following workaround might be helpful:

    def templated_function(ds, **kwargs):
        kwargs['ds'] = ds                                # put ds into 'context'
        task = kwargs['task']                            # get handle on task
        templ = open(kwargs['templates_dict']['file1']).read() # get template
        sql = task.render_template('', tmpl, kwargs)           # render it
        pp.pprint(sql)
    

    Would love a better solution, though!

    0 讨论(0)
  • 2021-01-31 05:40

    Unable to get a script file templated in python to work (new to python). But an example with bash operator is following, maybe that can give you some hints

    from datetime import datetime
    from airflow import DAG
    from airflow.operators.bash_operator import BashOperator
    
    default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        #'start_date': airflow.utils.dates.days_ago(2),
        'email': ['airflow@airflow.com']}
    
    dag = DAG('sr5', description='Simple tutorial DAG',
              schedule_interval='0 12 * * *',
              start_date=datetime(2017, 3, 20),
              catchup=False, #so that on scehduler restart, it doesn't try to catchup on all the missed runs
              template_searchpath=['/Users/my_name/Desktop/utils/airflow/resources'])
    
    t1 = BashOperator(
        task_id='t1',
        depends_on_past=False,
        params={
            'ds1': 'hie'},
        bash_command="01.sh",
        dag=dag)
    

    the 01.sh script looks like follows

    #!/bin/sh
    
    echo {{ ds }}
    echo {{ params.ds1 }}
    

    This give an output as follows on test execution

    [2017-05-12 08:31:52,981] {bash_operator.py:91} INFO - Output:

    [2017-05-12 08:31:52,984] {bash_operator.py:95} INFO - 2017-05-05

    [2017-05-12 08:31:52,984] {bash_operator.py:95} INFO - hie

    0 讨论(0)
提交回复
热议问题