Airflow S3KeySensor - How to make it continue running

后端 未结 2 609
余生分开走
余生分开走 2021-01-04 00:02

With the help of this Stackoverflow post I just made a program (the one shown in the post) where when a file is placed inside an S3 bucket a task in one of my running DAGs i

相关标签:
2条回答
  • 2021-01-04 00:29

    Within Airflow, there isn't a concept that maps to an always running DAG. You could have a DAG run very frequently like every 1 to 5 minutes if that suits your use case.

    The main thing here is that the S3KeySensor checks until it detects that the first file exists in the key's wildcard path (or timeout), then it runs. But when a second, or third, or fourth file lands, the S3 sensor will have already completed running for that DAG run. It won't get scheduled to run again until the next DAG run. (The looping idea you described is roughly equivalent to what the scheduler does when it creates DAG runs except not forever.)

    An external trigger definitely sounds like the best approach for your use case, whether that trigger comes via the Airflow CLI's trigger_dag command ($ airflow trigger_dag ...):

    https://github.com/apache/incubator-airflow/blob/972086aeba4616843005b25210ba3b2596963d57/airflow/bin/cli.py#L206-L222

    Or via the REST API:

    https://github.com/apache/incubator-airflow/blob/5de22d7fa0d8bc6b9267ea13579b5ac5f62c8bb5/airflow/www/api/experimental/endpoints.py#L41-L89

    Both turn around and call the trigger_dag function in the common (experimental) API:

    https://github.com/apache/incubator-airflow/blob/089c996fbd9ecb0014dbefedff232e8699ce6283/airflow/api/common/experimental/trigger_dag.py#L28-L67

    You could, for instance, setup an AWS Lambda function, called when a file lands on S3, that runs the trigger DAG call.

    0 讨论(0)
  • 2021-01-04 00:29

    Another way is to use the S3 trigger an aws lambda which will invoke the DAG using api

    s3 event -> aws lambda -> Airflow api

    Setup S3 notification to trigger lambda

    https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

    Airflow API

    https://airflow.apache.org/docs/apache-airflow/stable/rest-api-ref.html

    0 讨论(0)
提交回复
热议问题