What do the blue blocks in spark stage DAG visualisation UI mean?

后端 未结 1 724
深忆病人
深忆病人 2021-01-31 20:06

in the following snip for the application UI, what do the blue blocks in each stage represent?

What do \"Exchange\" and \"WholeStageCodeGen\", etc mean?

Where

1条回答
  •  既然无缘
    2021-01-31 20:37

    Each blue box is the steps of Apache Spark job.

    You are asking about the WholeStageCodegen this stuff is:

    Whole-Stage Code Generation (aka WholeStageCodegen or WholeStageCodegenExec) fuses multiple operators (as a subtree of plans that support codegen) together into a single Java function that is aimed at improving execution performance. It collapses a query into a single optimized function that eliminates virtual function calls and leverages CPU registers for intermediate data.

    You can see details here SPARK-12795

    The exchange means the Shuffle Exchange between jobs in more details:

    ShuffleExchange is a unary physical operator. It corresponds to Repartition (with shuffle enabled) and RepartitionByExpression logical operators (as translated in BasicOperators strategy).

    All this information you can get in your code using the explain command

    Each step shows you what your dataframe is going to do, this is good to find if your logic is right. If you want more details about Spark UI I suggest you to see this presentation of Spark Summit and read this article about the execution planning.

    These information will show you much more about your doubt.

    0 讨论(0)
提交回复
热议问题