I have two tables - one is a core data with a pair of IDs (PC1 and P2) and some blob data (P3). The other is a blacklist data for PC1 in the former table. I will call the first
Pass the join conditions as a list to the join
function, and specify how='left_anti'
as the join type:
in_df.join(
blacklist_df,
[in_df.PC1 == blacklist_df.P1, in_df.P2 == blacklist_df.B1],
how='left_anti'
).show()
+---+---+---+
|PC1| P2| P3|
+---+---+---+
| 1| 3| D|
| 4| 11| D|
| 3| 1| C|
+---+---+---+