问题
My Cloud Composer
managed Airflow
got stuck for hours since I've canceled a Task Instance that was taking too long (Let's call it Task A)
I've cleared all the DAG Runs and task instances, but there are a few jobs running and one job with Shutdown state (I suppose the job of Task A) (snapshot of my Jobs).
Besides, it seems that the scheduler is not running since recently deleted DAGs keep appearing in the dashboard
Is there a way to kill the jobs or reset the scheduler? Any idea to un-stuck the composer will be welcomed.
回答1:
You can restart the scheduler as follows:
From your cloud shell:
1.Determine your environment’s Kubernetes cluster:
gcloud composer environments describe ENVIRONMENT_NAME \
--location LOCATION
2.Get credentials and connect to the Kubernetes cluster:
gcloud container clusters get-credentials ${GKE_CLUSTER} --zone ${GKE_LOCATION}
3.Run the following command to restart the scheduler:
kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -
Steps 1 and 2 are detailed here. Step 3 basically replaces the “airflow-scheduler” deployment with itself, thus restarting the service.
If restarting the scheduler doesn’t help you may as well need to recreate your Composer Environment and Troubleshoot your DAGs if this happens every time.
回答2:
Which version of Composer are you running? It's a known issue that jobs may get stuck for beta versions. Composer 1.0.0 and 1.1.0 should not see any stuck jobs (except tasks in SubDag, which is a known Airflow bug), consider migrating to the latest Composer version.
来源:https://stackoverflow.com/questions/51859609/cloud-composer-airflow-jobs-stuck