when I am trying to run it on this folder it is throwing me ExecutorLostFailure everytime
Hi I am a beginner in Spark. I am trying to run a job on S
I was having this issue, and the problem for me was very high incidence of one key in a reduceByKey
task. This was (I think) causing a massive list to collect on one of the executors, which would then throw OOM errors.
The solution for me was to just filter out keys with high population before doing the reduceByKey
, but I appreciate that this may or may not be possible depending on your application. I didn't need all my data anyway.