Airflow S3KeySensor - How to make it continue running

后端未结

关注

 2  609

With the help of this Stackoverflow post I just made a program (the one shown in the post) where when a file is placed inside an S3 bucket a task in one of my running DAGs i

相关标签:

2条回答

温柔的废话

2021-01-04 00:29

Within Airflow, there isn't a concept that maps to an always running DAG. You could have a DAG run very frequently like every 1 to 5 minutes if that suits your use case.

The main thing here is that the S3KeySensor checks until it detects that the first file exists in the key's wildcard path (or timeout), then it runs. But when a second, or third, or fourth file lands, the S3 sensor will have already completed running for that DAG run. It won't get scheduled to run again until the next DAG run. (The looping idea you described is roughly equivalent to what the scheduler does when it creates DAG runs except not forever.)

An external trigger definitely sounds like the best approach for your use case, whether that trigger comes via the Airflow CLI's trigger_dag command ($ airflow trigger_dag ...):

https://github.com/apache/incubator-airflow/blob/972086aeba4616843005b25210ba3b2596963d57/airflow/bin/cli.py#L206-L222

Or via the REST API:

https://github.com/apache/incubator-airflow/blob/5de22d7fa0d8bc6b9267ea13579b5ac5f62c8bb5/airflow/www/api/experimental/endpoints.py#L41-L89

Both turn around and call the trigger_dag function in the common (experimental) API:

https://github.com/apache/incubator-airflow/blob/089c996fbd9ecb0014dbefedff232e8699ce6283/airflow/api/common/experimental/trigger_dag.py#L28-L67

You could, for instance, setup an AWS Lambda function, called when a file lands on S3, that runs the trigger DAG call.

0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2021-01-04 00:29

Another way is to use the S3 trigger an aws lambda which will invoke the DAG using api

s3 event -> aws lambda -> Airflow api

Setup S3 notification to trigger lambda

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

Airflow API

https://airflow.apache.org/docs/apache-airflow/stable/rest-api-ref.html

0 讨论(0)
发布评论:

提交评论
- 加载中...