Spark tasks stuck at RUNNING

前端 未结 1 1195
半阙折子戏
半阙折子戏 2020-12-30 17:41

I\'m trying to run a Spark ML pipeline (load some data from JDBC, run some transformers, train a model) on my Yarn cluster but each time I run it, a couple - sometimes one,

相关标签:
1条回答
  • 2020-12-30 17:59

    TLDR: Make sure your code is threadsafe and race condition-free before you blame Spark.

    Figured it out. For posterity: was using an thread-unsafe data structure (a mutable HashMap). Since executors on the same machine share a JVM, this was resulting in data races that were locking up the separate threads/tasks.

    The upshot: when you have spark.executor.cores > 1 (and you probably should), make sure your code is threadsafe.

    0 讨论(0)
提交回复
热议问题