How to safely restart Airflow and kill a long-running task?

风流意气都作罢 提交于 2021-01-07 06:20:15

问题


I have Airflow is running in Kubernetes using the CeleryExecutor. Airflow submits and monitors Spark jobs using the DatabricksOperator.

My streaming Spark jobs have a very long runtime (they run forever unless they fail or are cancelled). When pods for Airflow worker are killed while a streaming job is running, the following happens:

  1. Associated task becomes a zombie (running state, but no process with heartbeat)
  2. Task is marked as failed when Airflow reaps zombies
  3. Spark streaming job continues to run

How can I force the worker to kill my Spark job before it shuts down?

I've tried killing the Celery worker with a TERM signal, but apparently that causes Celery to stop accepting new tasks and wait for current tasks to finish (docs).


回答1:


You need to be more clear about the issue. If you are saying that the spark cluster finishes the jobs as expected and not calling the on_kill function, it's expected behavior. As per the docs on kill function is for cleaning up after task get killed.

def on_kill(self) -> None:
    """
    Override this method to cleanup subprocesses when a task instance
    gets killed. Any use of the threading, subprocess or multiprocessing
    module within an operator needs to be cleaned up or it will leave
    ghost processes behind.
    """

In your case when you manually kill the job it is doing what it has to do.

Now if you want to have a clean_up even after successful completion of the job, override post_execute function. As per the docs. The post execute is

def post_execute(self, context: Any, result: Any = None):
    """
    This hook is triggered right after self.execute() is called.
    It is passed the execution context and any results returned by the
    operator.
    """


来源:https://stackoverflow.com/questions/63141944/how-to-safely-restart-airflow-and-kill-a-long-running-task

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!