how to use pandas isin for multiple columns

后端 未结 3 2026
谎友^
谎友^ 2020-12-17 18:04

I want to find the values of col1 and col2 where the col1 and col2 of the first dataframe are

相关标签:
3条回答
  • 2020-12-17 18:38

    If somehow you must stick to isin or the negate version ~isin. You may first create a new column, with the concatenation of col1, col2. Then use isin to filter your data. Here is the code:

    import pandas as pd
    df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
    df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
    
    df1['indicator'] = df1['col1'].str.cat(df1['col2'])
    df2['indicator'] = df2['col1'].str.cat(df2['col2'])
    
    df2.loc[df2['indicator'].isin(df1['indicator'])].drop(columns=['indicator'])
    

    which gives

    
        col1    col2
    10  pizza   boy
    11  pizza   girl
    16  ice cream   boy
    

    If you do so remember to make sure that concatenating two columns doesn't create false positives e.g. concatenation of 123 and 456 in df1 and concatenation of 12 and 3456 in df2 will match even though their respective columns don't match. You can fix this problem by additional sep parameter.

    df1['indicator'] = df1['col1'].str.cat(df1['col2'], sep='$$$')
    df2['indicator'] = df2['col1'].str.cat(df2['col2'], sep='$$$')
    
    0 讨论(0)
  • 2020-12-17 18:51

    Perform an inner merge on col1 and col2:

    import pandas as pd
    df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
    df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
    
    print(pd.merge(df2.reset_index(), df1, how='inner').set_index('index'))
    

    yields

                col1  col2
    index                 
    10         pizza   boy
    11         pizza  girl
    16     ice cream   boy
    

    The purpose of the reset_index and set_index calls are to preserve df2's index as in the desired result you posted. If the index is not important, then

    pd.merge(df2, df1, how='inner')
    #         col1  col2
    # 0      pizza   boy
    # 1      pizza  girl
    # 2  ice cream   boy
    

    would suffice.


    Alternatively, you could construct MultiIndexs out of the col1 and col2 columns, and then call the MultiIndex.isin method:

    index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['col1', 'col2']])
    index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['col1', 'col2']])
    print(df2.loc[index2.isin(index1)])
    

    yields

             col1  col2
    10      pizza   boy
    11      pizza  girl
    16  ice cream   boy
    
    0 讨论(0)
  • 2020-12-17 18:56

    Thank you unutbu! Here is a little update.

    import pandas as pd
    df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
    df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
    df1[df1.set_index(['col1','col2']).index.isin(df2.set_index(['col1','col2']).index)]
    

    return:

        col1    col2
    1   pizza   boy
    4   pizza   girl
    5   ice cream   boy
    
    0 讨论(0)
提交回复
热议问题