I would like to display the entire Apache Spark SQL DataFrame with the Scala API. I can use the show()
method:
myDataFrame.show(Int.MaxValue)
I've tried show() and it seems working sometimes. But sometimes not working, just give it a try:
println(df.show())
One way is using count()
function to get the total number of records and use show(rdd.count())
.
In java
I have tried it with two ways.
This is working perfectly for me:
1.
data.show(SomeNo);
2.
data.foreach(new ForeachFunction<Row>() {
public void call(Row arg0) throws Exception {
System.out.println(arg0);
}
});
Try with,
df.show(35, false)
It will display 35 rows and 35 column values with full values name.
As others suggested, printing out entire DF is bad idea. However, you can use df.rdd.foreachPartition(f)
to print out partition-by-partition without flooding driver JVM (y using collect)
Nothing more succinct than that, but if you want to avoid the Int.MaxValue
, then you could use a collect
and process it, or foreach
. But, for a tabular format without much manual code, show
is the best you can do.