How to split dataset to two datasets with unique and duplicate rows each?

前端未结

关注

 2  1324

你的背包 2021-01-19 08:17

I want to take duplicate records in a Spark scala Dataframe. for example, I want to take duplicate values based on 3 columns like \"id\", \"name\", \"age\".condition part co

2条回答

伪装坚强ぢ (楼主)

2021-01-19 08:28

You need to give comma separated col names.

col1 ..col2 should be of string type.
     val window= Window.partitionBy(col1,col2,..)


    findDuplicateRecordsDF.withColumn("count", count("*")
          .over(window)
          .where($"count">1)
          .show()

0 讨论(0)

查看其它2个回答