Cloud Composer (Airflow) jobs stuck

泪湿孤枕 提交于 2019-12-22 12:22:15

问题


My Cloud Composer managed Airflow got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A)

I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown state (I suppose the job of Task A) (snapshot of my Jobs).

Besides, it seems that the scheduler is not running since recently deleted DAGs keep appearing in the dashboard

Is there a way to kill the jobs or reset the scheduler? Any idea to un-stuck the composer will be welcomed.


回答1:


You can restart the scheduler as follows:

From your cloud shell:

1.Determine your environment’s Kubernetes cluster:

gcloud composer environments describe ENVIRONMENT_NAME \
    --location LOCATION 

2.Get credentials and connect to the Kubernetes cluster:

gcloud container clusters get-credentials ${GKE_CLUSTER} --zone ${GKE_LOCATION}

3.Run the following command to restart the scheduler:

kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -

Steps 1 and 2 are detailed here. Step 3 basically replaces the “airflow-scheduler” deployment with itself, thus restarting the service.

If restarting the scheduler doesn’t help you may as well need to recreate your Composer Environment and Troubleshoot your DAGs if this happens every time.




回答2:


Which version of Composer are you running? It's a known issue that jobs may get stuck for beta versions. Composer 1.0.0 and 1.1.0 should not see any stuck jobs (except tasks in SubDag, which is a known Airflow bug), consider migrating to the latest Composer version.



来源:https://stackoverflow.com/questions/51859609/cloud-composer-airflow-jobs-stuck

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!