Is there an “Explain RDD” in spark

前端 未结 2 1258
情书的邮戳
情书的邮戳 2021-02-19 02:55

In particular, if I say

rdd3 = rdd1.join(rdd2)

then when I call rdd3.collect, depending on the Partitioner used, eit

2条回答
  •  生来不讨喜
    2021-02-19 03:36

    I would use Spark UI (the web page the spark context used to serve) instead of toDebugString whenever I can. Much easier to comprehend, and a bit more information (and less glitches according my very limited experience). Also, Spark UI shows the number of Tasks and their input and output sizes for each Stage, which helps figuring out what it does.

    Besides, there's very little information shown in both of them. Mostly just a graph of boxes saying MapPartitionsRDD [12] and such, which doesn't tell much about what that step actually does. (For WholeStageCodegen boxes the DEBUG log under org.apache.spark.sql.execution contains the generated code at least. But there's no any kind of ID logged to pair them with what you see on Spark UI.)

提交回复
热议问题