I have a DataFrame created by running sqlContext.read of a Parquet file.
DataFrame
sqlContext.read
The DataFrame consists of 300 M rows. I need to use these
You can simple use the limit and except api of dataset or dataframes as follows
long count = df.count(); int limit = 50; while(count > 0){ df1 = df.limit(limit); df1.show(); //will print 50, next 50, etc rows df = df.except(df1); count = count - limit; }