How to filter duplicate records having multiple key in Spark Dataframe?

后端 未结 1 1022
广开言路
广开言路 2020-12-22 04:37

I have two dataframes. I want to delete some records in Data Frame-A based on some common column values in Data Frame-B.

For Example: Data Frame-A:



        
1条回答
  •  生来不讨喜
    2020-12-22 05:09

    You are looking for left anti-join:

    df_a.join(df_b, Seq("A","B","C"), "leftanti").show()
    +---+---+---+---+
    |  A|  B|  C|  D|
    +---+---+---+---+
    |  3|  4|  5|  7|
    |  4|  7|  9|  6|
    +---+---+---+---+
    

    0 讨论(0)
提交回复
热议问题