How to shuffle the rows in a Spark dataframe?

前端 未结 1 1543
野性不改
野性不改 2021-02-18 15:51

I have a dataframe like this:

+---+---+
|_c0|_c1|
+---+---+
|1.0|4.0|
|1.0|4.0|
|2.1|3.0|
|2.1|3.0|
|2.1|3.0|
|2.1|3.0|
|3.0|6.0|
|4.0|5.0|
|4.0|5.0|
|4.0|5.0|
+         


        
相关标签:
1条回答
  • 2021-02-18 16:44

    You need to use orderBy method of the dataframe:

    import org.apache.spark.sql.functions.rand
    val shuffledDF = dataframe.orderBy(rand())
    
    0 讨论(0)
提交回复
热议问题