Is there better way to display entire Spark SQL DataFrame?

后端 未结 7 700
轻奢々
轻奢々 2021-01-30 20:53

I would like to display the entire Apache Spark SQL DataFrame with the Scala API. I can use the show() method:

myDataFrame.show(Int.MaxValue)


        
相关标签:
7条回答
  • 2021-01-30 21:35

    I've tried show() and it seems working sometimes. But sometimes not working, just give it a try:

    println(df.show())
    
    0 讨论(0)
  • 2021-01-30 21:39

    One way is using count() function to get the total number of records and use show(rdd.count()) .

    0 讨论(0)
  • 2021-01-30 21:41

    In java I have tried it with two ways. This is working perfectly for me:

    1.

    data.show(SomeNo);
    

    2.

    data.foreach(new ForeachFunction<Row>() {
                    public void call(Row arg0) throws Exception {
                        System.out.println(arg0);
                    }
                });
    
    0 讨论(0)
  • 2021-01-30 21:48

    Try with,

    df.show(35, false)

    It will display 35 rows and 35 column values with full values name.

    0 讨论(0)
  • 2021-01-30 21:48

    As others suggested, printing out entire DF is bad idea. However, you can use df.rdd.foreachPartition(f) to print out partition-by-partition without flooding driver JVM (y using collect)

    0 讨论(0)
  • 2021-01-30 21:48

    Nothing more succinct than that, but if you want to avoid the Int.MaxValue, then you could use a collect and process it, or foreach. But, for a tabular format without much manual code, show is the best you can do.

    0 讨论(0)
提交回复
热议问题