How to fix Connection reset by peer message from apache-spark?

前端 未结 1 540
青春惊慌失措
青春惊慌失措 2021-02-05 08:54

I keep getting the the following exception very frequently and I wonder why this is happening? After researching I found I could do .set(\"spark.submit.deployMode\", \"nio

相关标签:
1条回答
  • 2021-02-05 09:18

    I was getting the same error even if I tried many things.My job used to get stuck throwing this error after running a very long time. I tried few work around which helped me to resolve. Although, I still get the same error by at least my job runs fine.

    1. one reason could be the executors kills themselves thinking that they lost the connection from the master. I added the below configurations in spark-defaults.conf file.

      spark.network.timeout 10000000 spark.executor.heartbeatInterval 10000000 basically,I have increased the network timeout and heartbeat interval

    2. The particular step which used to get stuck, I just cached the dataframe that is used for processing (in the step which used to get stuck)

    Note:- These are work arounds, I still see the same error in error logs but the my job does not get terminated.

    0 讨论(0)
提交回复
热议问题