I am using pyspark to estimate parameters for a logistic regression model. I use spark to calculate the likelihood and gradients and then use scipy\'s minimize function for
Check the executors logs for details. I have seen similar errors when executors die or are killed by the cluster manager (usually for using more memory than the container is configured for).
I had a similar problem, and for me, this fixed it:
import pyspark as ps
conf = ps.SparkConf().setMaster("yarn-client").setAppName("sparK-mer")
conf.set("spark.executor.heartbeatInterval","3600s")
sc = ps.SparkContext('local[4]', '', conf=conf) # uses 4 cores on your local machine
More examples of setting other options here: https://gist.github.com/robenalt/5b06415f52009c5035910d91f5b919ad
I had similar problem. I had an iteration, and sometimes execution took so long it timed out. Increasing spark.executor.heartbeatInterval
seemed to solve the problem. I increased it to 3600s to ensure I don't run into timeouts again and everything is working fine since then.
From: http://spark.apache.org/docs/latest/configuration.html :
spark.executor.heartbeatInterval 10s Interval between each executor's heartbeats to the driver. Heartbeats let the driver know that the executor is still alive and update it with metrics for in-progress tasks.