Pyspark filter dataframe by columns of another dataframe

后端 未结 1 1696
你的背包
你的背包 2020-11-27 06:26

Not sure why I\'m having a difficult time with this, it seems so simple considering it\'s fairly easy to do in R or pandas. I wanted to avoid using pandas though since I\'m

相关标签:
1条回答
  • 2020-11-27 07:02

    Left anti join is what you're looking for:

    df1.join(df2, ["userid", "group"], "leftanti")
    

    but the same thing can be done with left outer join:

    (df1
        .join(df2, ["userid", "group"], "leftouter")
        .where(df2["pick"].isNull())
        .drop(df2["pick"]))
    
    0 讨论(0)
提交回复
热议问题