How could I configure from Java (or Scala) code amount of executors having SparkConfig
and SparkContext
? I see constantly 2 executors. Looks like
We had a similar problem in my lab running Spark on Yarn with data on hdfs, but no matter which of the above solutions I tried, I could not increase the number of Spark executors beyond two.
Turns out the dataset was too small (less than the hdfs block size of 128 MB), and only existed on two of the data nodes (1 master, 7 data nodes in my cluster) due to hadoop's default data replication heuristic.
Once my lab-mates and I had more files (and larger files) and the data was spread on all nodes, we could set the number of Spark executors, and finally see an inverse relationship between --num-executors
and time to completion.
Hope this helps someone else in a similar situation.