How to split dataset to two datasets with unique and duplicate rows each?

前端 未结 2 1321
你的背包
你的背包 2021-01-19 08:17

I want to take duplicate records in a Spark scala Dataframe. for example, I want to take duplicate values based on 3 columns like \"id\", \"name\", \"age\".condition part co

2条回答
  •  伪装坚强ぢ
    2021-01-19 08:28

    You need to give comma separated col names.

    col1 ..col2 should be of string type.
         val window= Window.partitionBy(col1,col2,..)
    
    
        findDuplicateRecordsDF.withColumn("count", count("*")
              .over(window)
              .where($"count">1)
              .show()
    

提交回复
热议问题