Error ExecutorLostFailure when running a task in Spark

后端 未结 4 1603
日久生厌
日久生厌 2020-12-28 22:46

when I am trying to run it on this folder it is throwing me ExecutorLostFailure everytime

Hi I am a beginner in Spark. I am trying to run a job on S

相关标签:
4条回答
  • 2020-12-28 23:00

    It is hard to say what the problem is without the log of the failed executor and not the driver's but most likely it is a memory problem. Try increasing the partition number significantly (if your current is 32 try 200)

    0 讨论(0)
  • 2020-12-28 23:13

    This error is occurring because a task failed more than four times. Try increase the parallelism in your cluster using the following parameter.

    --conf "spark.default.parallelism=100" 
    

    Set the parallelism value to 2 to 3 time the number of cores available on your cluster. If that doesn't work. try increase the parallelism in an exponential fashion. i.e if your current parallelism doesn't work multiply it by two and so on. Also I have observed that it helps if your level of parallelism is a prime number especially if you are using groupByKkey.

    0 讨论(0)
  • 2020-12-28 23:13

    I was having this issue, and the problem for me was very high incidence of one key in a reduceByKey task. This was (I think) causing a massive list to collect on one of the executors, which would then throw OOM errors.

    The solution for me was to just filter out keys with high population before doing the reduceByKey, but I appreciate that this may or may not be possible depending on your application. I didn't need all my data anyway.

    0 讨论(0)
  • 2020-12-28 23:17

    The Most common cause of ExecutorLostFailure as per my understanding is OOM in executor.

    In order to resolve the OOM issue, one needs to figure out what exactly is causing it. Simply increasing the default parallelism or increasing the executor memory is not a strategic solution.

    If you look at what increasing parallelism do is it tries to create more executors so that each executor can work on less and less data. But if your data is skewed such that the key on which data partitioning happens (for parallelism) has more data, simply increasing parallelism will be of no effect.

    Similarly just by increasing Executor memory will be a very inefficient way of handing such a scenario as if only one executor is failing with ExecutorLostFailure , requesting increased memory for all the executors will make your application require much more memory then actually expected.

    0 讨论(0)
提交回复
热议问题