问题
When starting spark job from Apache Zeppelin notebook interface it shows you a progress bar of job execution. But what does this progress actually mean? Sometimes it shrinks or expands. Is it a progress of current stage or a whole job?
回答1:
In the web interface, the progress bar is showing the value returned by the getProgress
function (not implemented for every interpeters, such as python).
This function returns a percentage.
When using the Spark interpreter, the value seems to be the percentage of tasks done (Calling the following progress
function from JobProgressUtil) :
def progress(sc: SparkContext, jobGroup : String):Int = {
val jobIds = sc.statusTracker.getJobIdsForGroup(jobGroup)
val jobs = jobIds.flatMap { id => sc.statusTracker.getJobInfo(id) }
val stages = jobs.flatMap { job =>
job.stageIds().flatMap(sc.statusTracker.getStageInfo)
}
val taskCount = stages.map(_.numTasks).sum
val completedTaskCount = stages.map(_.numCompletedTasks).sum
if (taskCount == 0) {
0
} else {
(100 * completedTaskCount.toDouble / taskCount).toInt
}
}
Meanwhile, I could not find it specified in the Zeppelin documentation.
来源:https://stackoverflow.com/questions/56652680/how-apache-zeppelin-computes-spark-job-progress-bar