Python Pandas: How to merge based on an “OR” condition?

后端 未结 2 1828
甜味超标
甜味超标 2021-02-13 03:55

Let\'s say I have two dataframes, and the column names for both are:

table 1 columns:
[ShipNumber, TrackNumber, ShipDate, Quantity, Weight]
table 2 columns:
[Shi         


        
相关标签:
2条回答
  • 2021-02-13 04:14

    Use merge() and concat(). Then drop any duplicate cases where both A and B match (thanks @Scott Boston for that final step).

    df1 = pd.DataFrame({'A':[3,2,1,4], 'B':[7,8,9,5]})
    df2 = pd.DataFrame({'A':[1,5,6,4], 'B':[4,1,8,5]})
    
    df1         df2
       A  B        A  B
    0  1  7     0  1  4
    1  2  8     1  5  1
    2  3  9     2  6  8
    3  4  5     3  4  5
    

    With these data frames we should see:

    • df1.loc[0] matches A on df2.loc[0]
    • df1.loc[1] matches B on df2.loc[2]
    • df1.loc[3] matches both A and B on df2.loc[3]

    We'll use suffixes to keep track of what matched where:

    suff_A = ['_on_A_match_1', '_on_A_match_2']
    suff_B = ['_on_B_match_1', '_on_B_match_2']
    
    pd.concat([df1.merge(df2, on='A', suffixes=suff_A), 
               df1.merge(df2, on='B', suffixes=suff_B)])
    
         A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
    0  1.0             NaN             NaN  NaN             9.0             4.0
    1  4.0             NaN             NaN  NaN             5.0             5.0
    0  NaN             2.0             6.0  8.0             NaN             NaN
    1  NaN             4.0             4.0  5.0             NaN             NaN
    

    Note that the second and fourth rows are duplicate matches (for both data frames, A = 4 and B = 5). We need to remove one of those sets.

    dupes = (df.B_on_A_match_1 == df.B_on_A_match_2) # also could remove A_on_B_match
    df.loc[~dupes]
    
         A  A_on_B_match_1  A_on_B_match_2    B  B_on_A_match_1  B_on_A_match_2
    0  1.0             NaN             NaN  NaN             9.0             4.0
    0  NaN             2.0             6.0  8.0             NaN             NaN
    1  NaN             4.0             4.0  5.0             NaN             NaN
    
    0 讨论(0)
  • 2021-02-13 04:14

    I would suggest this alternate way for doing merge like this. This seems easier for me.

    table1["id_to_be_merged"] = table1.apply(
        lambda row: row["ShipNumber"] if pd.notnull(row["ShipNumber"]) else row["TrackNumber"], axis=1)
    

    You can add the same column in table2 as well if needed and then use in left_in or right_on based on your requirement.

    0 讨论(0)
提交回复
热议问题