I have a dataframe like this:
+---+---+ |_c0|_c1| +---+---+ |1.0|4.0| |1.0|4.0| |2.1|3.0| |2.1|3.0| |2.1|3.0| |2.1|3.0| |3.0|6.0| |4.0|5.0| |4.0|5.0| |4.0|5.0| +
You need to use orderBy method of the dataframe:
orderBy
import org.apache.spark.sql.functions.rand val shuffledDF = dataframe.orderBy(rand())