Error ExecutorLostFailure when running a task in Spark

狂风中的少年 提交于 2019-12-03 00:12:33
Rishi

This error is occurring because a task failed more than four times. Try increase the parallelism in your cluster using the following parameter.

--conf "spark.default.parallelism=100" 

Set the parallelism value to 2 to 3 time the number of cores available on your cluster. If that doesn't work. try increase the parallelism in an exponential fashion. i.e if your current parallelism doesn't work multiply it by two and so on. Also I have observed that it helps if your level of parallelism is a prime number especially if you are using groupByKkey.

It is hard to say what the problem is without the log of the failed executor and not the driver's but most likely it is a memory problem. Try increasing the partition number significantly (if your current is 32 try 200)

I was having this issue, and the problem for me was very high incidence of one key in a reduceByKey task. This was (I think) causing a massive list to collect on one of the executors, which would then throw OOM errors.

The solution for me was to just filter out keys with high population before doing the reduceByKey, but I appreciate that this may or may not be possible depending on your application. I didn't need all my data anyway.

The Most common cause of ExecutorLostFailure as per my understanding is OOM in executor.

In order to resolve the OOM issue, one needs to figure out what exactly is causing it. Simply increasing the default parallelism or increasing the executor memory is not a strategic solution.

If you look at what increasing parallelism do is it tries to create more executors so that each executor can work on less and less data. But if your data is skewed such that the key on which data partitioning happens (for parallelism) has more data, simply increasing parallelism will be of no effect.

Similarly just by increasing Executor memory will be a very inefficient way of handing such a scenario as if only one executor is failing with ExecutorLostFailure , requesting increased memory for all the executors will make your application require much more memory then actually expected.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!