With the help of this Stackoverflow post I just made a program (the one shown in the post) where when a file is placed inside an S3 bucket a task in one of my running DAGs i
Within Airflow, there isn't a concept that maps to an always running DAG. You could have a DAG run very frequently like every 1 to 5 minutes if that suits your use case.
The main thing here is that the S3KeySensor checks until it detects that the first file exists in the key's wildcard path (or timeout), then it runs. But when a second, or third, or fourth file lands, the S3 sensor will have already completed running for that DAG run. It won't get scheduled to run again until the next DAG run. (The looping idea you described is roughly equivalent to what the scheduler does when it creates DAG runs except not forever.)
An external trigger definitely sounds like the best approach for your use case, whether that trigger comes via the Airflow CLI's trigger_dag command ($ airflow trigger_dag ...
):
https://github.com/apache/incubator-airflow/blob/972086aeba4616843005b25210ba3b2596963d57/airflow/bin/cli.py#L206-L222
Or via the REST API:
https://github.com/apache/incubator-airflow/blob/5de22d7fa0d8bc6b9267ea13579b5ac5f62c8bb5/airflow/www/api/experimental/endpoints.py#L41-L89
Both turn around and call the trigger_dag
function in the common (experimental) API:
https://github.com/apache/incubator-airflow/blob/089c996fbd9ecb0014dbefedff232e8699ce6283/airflow/api/common/experimental/trigger_dag.py#L28-L67
You could, for instance, setup an AWS Lambda function, called when a file lands on S3, that runs the trigger DAG call.
Another way is to use the S3 trigger an aws lambda which will invoke the DAG using api
s3 event -> aws lambda -> Airflow api
Setup S3 notification to trigger lambda
https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html
Airflow API
https://airflow.apache.org/docs/apache-airflow/stable/rest-api-ref.html