Is there better way to display entire Spark SQL DataFrame?

后端 未结 7 701
轻奢々
轻奢々 2021-01-30 20:53

I would like to display the entire Apache Spark SQL DataFrame with the Scala API. I can use the show() method:

myDataFrame.show(Int.MaxValue)


        
相关标签:
7条回答
  • 2021-01-30 21:53

    It is generally not advisable to display an entire DataFrame to stdout, because that means you need to pull the entire DataFrame (all of its values) to the driver (unless DataFrame is already local, which you can check with df.isLocal).

    Unless you know ahead of time that the size of your dataset is sufficiently small so that driver JVM process has enough memory available to accommodate all values, it is not safe to do this. That's why DataFrame API's show() by default shows you only the first 20 rows.

    You could use the df.collect which returns Array[T] and then iterate over each line and print it:

    df.collect.foreach(println)
    

    but you lose all formatting implemented in df.showString(numRows: Int) (that show() internally uses).

    So no, I guess there is no better way.

    0 讨论(0)
提交回复
热议问题