How to filter duplicate records having multiple key in Spark Dataframe?

后端未结

关注

 1  1022

广开言路 2020-12-22 04:37

I have two dataframes. I want to delete some records in Data Frame-A based on some common column values in Data Frame-B.

For Example: Data Frame-A:

1条回答

生来不讨喜 (楼主)

2020-12-22 05:09

You are looking for left anti-join:

df_a.join(df_b, Seq("A","B","C"), "leftanti").show()
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+
|  3|  4|  5|  7|
|  4|  7|  9|  6|
+---+---+---+---+

0 讨论(0)