all rows in df1 that are NOT in df2

后端 未结 3 547
礼貌的吻别
礼貌的吻别 2021-01-22 16:08

I have a df (df1) that looks like:

df1 = pd.DataFrame([
        [\'YYZ\', \'SFO\', 1],
        [\'YYZ\', \'YYD\', 1],
        [\'YYZ\', \'EWR\', 1],
        [\'Y         


        
相关标签:
3条回答
  • 2021-01-22 16:10
    • Use merge with indicator=True
    • Then use query to strip out only those with 'left_only'

    df1.merge(
        df2, how='outer', indicator=True
    ).query('_merge == "left_only"').drop('_merge', 1)
    
      city1 city2  val
    2   YYZ   EWR    1
    3   YYZ   DFW    1
    4   YYZ   LAX    1
    5   YYZ   YYC    1
    
    0 讨论(0)
  • 2021-01-22 16:11

    the ~ symbol reverses the isin and makes it effectively a isnotin

    0 讨论(0)
  • 2021-01-22 16:29

    Just ask the question straight in plain English, hmm I mean in plain pandas. "Select all rows in df1 that are not in df2" translates to:

    df1[~df1.isin(df2).all(axis=1)]
    Out[127]: 
      city1 city2  val
    2   YYZ   EWR    1
    3   YYZ   DFW    1
    4   YYZ   LAX    1
    5   YYZ   YYC    1
    
    0 讨论(0)
提交回复
热议问题