问题
Spark architecture is entirely revolves around the concept of executors and cores. I would like to see practically how many executors and cores running for my spark application running in a cluster.
I was trying to use below snippet in my application but no luck.
val conf = new SparkConf().setAppName("ExecutorTestJob")
val sc = new SparkContext(conf)
conf.get("spark.executor.instances")
conf.get("spark.executor.cores")
Is there any way to get those values using SparkContext
Object or SparkConf
object etc..
回答1:
Scala (Programmatic way) :
getExecutorStorageStatus
and getExecutorMemoryStatus
both return the number of executors including driver.
like below example snippet.
/** Method that just returns the current active/registered executors
* excluding the driver.
* @param sc The spark context to retrieve registered executors.
* @return a list of executors each in the form of host:port.
*/
def currentActiveExecutors(sc: SparkContext): Seq[String] = {
val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
val driverHost: String = sc.getConf.get("spark.driver.host")
allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
}
sc.getConf.getInt("spark.executor.instances", 1)
similarly get all properties and print like below you may get cores information as well..
sc.getConf.getAll.mkString("\n")
OR
sc.getConf.toDebugString
Mostly spark.executor.cores
for executors spark.driver.cores
driver should have this value.
Python :
Above methods getExecutorStorageStatus and getExecutorMemoryStatus, In python api were not implemented
EDIT But can be accessed using Py4J bindings exposed from SparkSession.
sc._jsc.sc().getExecutorMemoryStatus()
回答2:
This is an old question, but this is my code for figuring this out on Spark 2.3.0:
+ 414 executor_count = len(spark.sparkContext._jsc.sc().statusTracker().getExecutorInfos()) - 1
+ 415 cores_per_executor = int(spark.sparkContext.getConf().get('spark.executor.cores','1'))
回答3:
This is python Example to get number of cores (including master's)
def workername():
import socket
return str(socket.gethostname())
anrdd=sc.parallelize(['',''])
namesRDD = anrdd.flatMap(lambda e: (1,workername()))
namesRDD.count()
来源:https://stackoverflow.com/questions/39162063/spark-how-many-executors-and-cores-are-allocated-to-my-spark-job