Apache Spark: setting executor instances does not change the executors

前端未结

关注

 4  577

盖世英雄少女心

I have an Apache Spark application running on a YARN cluster (spark has 3 nodes on this cluster) on cluster mode.

When the application is running the Spark-UI shows

相关标签:

4条回答

一个人的身影

2020-12-01 01:25
To utilize the spark cluster to its full capacity you need to set values for --num-executors, --executor-cores and --executor-memory as per your cluster:
- --num-executors command-line flag or spark.executor.instances configuration property controls the number of executors requested ;
- --executor-cores command-line flag or spark.executor.cores configuration property controls the number of concurrent tasks an executor can run ;
- --executor-memory command-line flag or spark.executor.memory configuration property controls the heap size.
0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-12-01 01:32

You only have 3 nodes in the cluster, and one will be used as the driver, you have only 2 nodes left, how can you create 6 executors?

I think you confused --num-executors with --executor-cores.

To increase concurrency, you need more cores, you want to utilize all the CPUs in your cluster.

0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2020-12-01 01:43
Note that yarn.nodemanager.resource.memory-mb is total memory that a single NodeManager can allocate across all containers on one node.

In your case, since yarn.nodemanager.resource.memory-mb = 12G, if you add up the memory allocated to all YARN containers on any single node, it cannot exceed 12G.

You have requested 11G (-executor-memory 11G) for each Spark Executor container. Though 11G is less than 12G, this still won't work. Why ?
- Because you have to account for spark.yarn.executor.memoryOverhead, which is min(executorMemory * 0.10, 384) (by default, unless you override it).
So, following math must hold true:

spark.executor.memory + spark.yarn.executor.memoryOverhead <= yarn.nodemanager.resource.memory-mb

See: https://spark.apache.org/docs/latest/running-on-yarn.html for latest documentation on spark.yarn.executor.memoryOverhead

Moreover, spark.executor.instances is merely a request. Spark ApplicationMaster for your application will make a request to YARN ResourceManager for number of containers = spark.executor.instances. Request will be granted by ResourceManager on NodeManager node based on:
- Resource availability on the node. YARN scheduling has its own nuances - this is a good primer on how YARN FairScheduler works.
- Whether yarn.nodemanager.resource.memory-mb threshold has not been exceeded on the node:
  - (number of spark containers running on the node * (spark.executor.memory + spark.yarn.executor.memoryOverhead)) <= yarn.nodemanager.resource.memory-mb*
If the request is not granted, request will be queued and granted when above conditions are met.
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-01 01:44

Increase yarn.nodemanager.resource.memory-mb in yarn-site.xml

With 12g per node you can only launch driver(3g) and 2 executors(11g).

Node1 - driver 3g (+7% overhead)

Node2 - executor1 11g (+7% overhead)

Node3 - executor2 11g (+7% overhead)

now you are requesting for executor3 of 11g and no node has 11g memory available.

for 7% overhead refer spark.yarn.executor.memoryOverhead and spark.yarn.driver.memoryOverhead in https://spark.apache.org/docs/1.2.0/running-on-yarn.html

0 讨论(0)
发布评论:

提交评论
- 加载中...