Spark tasks stuck at RUNNING

前端未结

关注

 1  1195

I\'m trying to run a Spark ML pipeline (load some data from JDBC, run some transformers, train a model) on my Yarn cluster but each time I run it, a couple - sometimes one,

相关标签:

1条回答

感动是毒

2020-12-30 17:59

TLDR: Make sure your code is threadsafe and race condition-free before you blame Spark.

Figured it out. For posterity: was using an thread-unsafe data structure (a mutable HashMap). Since executors on the same machine share a JVM, this was resulting in data races that were locking up the separate threads/tasks.

The upshot: when you have spark.executor.cores > 1 (and you probably should), make sure your code is threadsafe.

0 讨论(0)
发布评论:

提交评论
- 加载中...