Run airflow DAG for each file

青春壹個敷衍的年華 提交于 2021-02-10 04:09:29

问题


So I have this quite nice DAG in airflow which basically runs several analysis steps (implemented as airflow plugins) on binary files. A DAG is triggert by an ftp sensor which just checks if there is a new file on the ftp server and then starts the whole workflow.

So currently the workflow is like this: DAG is triggert as defined -> sensor waits for new file on ftp -> analysis steps are executed -> end of workflow.

What I'd like to have is something like this: DAG is triggerts -> sensor waits for new file on ftp -> for every file on the ftp the analysis steps are executed individully -> each workflow ends individually.

How do I get the analysis workflow to be executed for each file on the ftp server and if there is no file on the server, just one sensor should wait for a new file? I don't want to e.g., start a DAG every second or so because then I have many sensors just waiting for a new file.


回答1:


Use 2 DAGs to separate the sensing step from analysis steps.

DAG 1:

sensor waits for new file on ftp -> once new file lands, use TriggerDagRunOperator to trigger DAG 1 itself -> use TriggerDagRunOperator to trigger DAG 2

DAG 2:

do the analysis steps for the file



来源:https://stackoverflow.com/questions/54992541/run-airflow-dag-for-each-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!