When I run a job on Apache Spark, the web UI gives a view similar to this:
While this is incredibly useful for me as a developer to see where things are, I
You can use the following API(s) to set and unset the stage names. https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#setCallSite-java.lang.String- https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#clearCallSite--
Also, Spark supports the concept of Job Groups within the application, following API(s) can be used to set and unset the job group names. https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#setJobGroup-java.lang.String-java.lang.String-boolean- https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#clearJobGroup--
The job description within the job group can also be configured using following API. https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html#setJobDescription-java.lang.String-
That's where one of the very uncommon features of Spark Core called local properties applies so well.
Spark SQL uses it to group different Spark jobs under a single structured query so you can use SQL tab and navigate easily.
You can control local properties using SparkContext.setLocalProperty:
Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool. User-defined properties may also be set here. These properties are propagated through to worker tasks and can be accessed there via org.apache.spark.TaskContext#getLocalProperty.
web UI uses two local properties:
callSite.short
in Jobs tab (and is exactly what you want)callSite.long
in Job Details page.scala> sc.setLocalProperty("callSite.short", "callSite.short")
scala> sc.setLocalProperty("callSite.long", "this is callSite.long")
scala> sc.parallelize(0 to 9).count
res2: Long = 10
And the result in web UI.
Click a job to see the details where you can find the longer call site, i.e. callSite.long
.
Here comes the Stages tab.