Google Dataproc - disconnect with executors often

谁都会走 提交于 2019-12-08 14:13:30

If the job itself isn't failing, with the fact that you're not seeing other propagated errors associated with actual task failures (at least as far as I can tell from what's posted in the question) most likely you're just seeing the harmless but known to be spammy issue in core Spark; the key here is that Spark dynamic allocation relinquishes underused executors during a job, and re-allocates them as needed. They originally failed to suppress the executor-lost part of it, but we've tested tomake sure it has no ill effects on the actual job.

Here's a googlegroups thread highlighting some of the behavioral details of Spark on YARN.

To check whether it's indeed dynamic allocation causing the messages, try running:

spark-shell --conf spark.dynamicAllocation.enabled=false \
    --conf spark.executor.instances=99999

Or if you're submitting jobs through gcloud beta dataproc jobs, then:

gcloud beta dataproc jobs submit spark \
    --properties spark.dynamicAllocation.enabled=false,spark.executor.instances=99999

If you're really seeing network hiccups or other Dataproc errors disassociating the master/worker when it's not an application-side OOM or something, you can email the Dataproc team directly at dataproc-feedback@google.com; beta would be no excuse for latent broken behavior (though of course we hope to weed out tricky edge-case bugs that we may not have discovered yet during the beta period).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!