问题
In Spark, an executor may run many tasks concurrently maybe 2 or 5 or 6 .
How Spark figures out (or calculate) the number of tasks to be run in the same executor concurrently i.e how many tasks can run in an executor concurrently?
An executor may be executing one task but one more task maybe be placed to run concurrently on same executor? What's the criteria for that?
An executor has fixed number of cores & memory. As we do not specify memory & cores requirements for task in Spark, how to calculate how many can run concurrently in an executor?
回答1:
The number of tasks run parallely within an executor = number of cores configured. You can always change this number through configuration. The total number of tasks run by executor overall ( parallel or sequential) depends upon the total number of tasks created ( through number of splits) and through number of executors.
All tasks running in one executor share the same memory configured. Inside, it just launches as many threads as number of cores.
回答2:
One most probable issue could be the skewed partitions in the RDD you are processing. If 2-6 partitions are having a lot of data on them, then in order to reduce data shuffle over the network, Spark will try that the executors process the data residing locally on their own nodes. So you'll see those 2-6 executors working for a long time and the others would be done with there data in few milliseconds.
You can find more about this in this stackoverflow question.
来源:https://stackoverflow.com/questions/39383984/spark-executor-tasks-concurrency