Airflow + celery or dask. For what, when?

后端 未结 2 1536
被撕碎了的回忆
被撕碎了的回忆 2021-02-04 01:54

I read in the official Airflow documentation the following:

What does this mean exactly? What do the authors mean by scaling out? That is, when

2条回答
  •  故里飘歌
    2021-02-04 02:49

    In Airflow terminology an "Executor" is the component responsible for running your task. The LocalExecutor does this by spawning threads on the computer Airflow runs on and lets the thread execute the task.

    Naturally your capacity is then limited by the available resources on the local machine. The CeleryExecutor distributes the load to several machines. The executor itself publishes a request to execute a task to a queue, and one of several worker nodes picks up the request and executes it. You can now scale the cluster of worker nodes to increase overall capacity.

    Finally, and not ready yet, there's a KubernetesExecutor in the works (link). This will run tasks on a Kubernetes cluster. This will not only give your tasks complete isolation since they're run in containers, you can also leverage the existing capabilities in Kubernetes to for instance auto scale your cluster so that you always have an optimal amount of resources available.

提交回复
热议问题