Finding rows with same column values in pandas dataframe

帅比萌擦擦* 提交于 2021-02-19 06:37:07

问题


I have two dataframes with different column size, where four columns can have the same values in both dataframes. I want to make a new column in df1, that takes the value 1 if there is a row in df2 that has the same values for column 'A','B','C', and 'D' as a row in df1. If there isn't such a row, I want the value to be 0. Rows 'E' and 'F' are not important for checking the values.

Is there a pandas function that can do this, or do I have to this in a loop.

For example:

df1 =
A    B    C    D    E    F
1    1    20   20   3    2
1    1    12   14   1    3
2    1    13   43   4    3
2    2    12   34   1    4

df2 =
A    B    C    D    E    
1    3    12   14   2    
1    1    20   20   4   
2    2    21   31   5    
2    2    12   34   8    

expected output:

df1 =
A    B    C    D    E    F    Target
1    1    20   20   3    2    1
1    1    12   14   1    3    0
2    1    13   43   4    3    0
2    2    12   34   1    4    1

回答1:


This is fairly simple. If you check whether two DataFrames are equal, it checks if each element is equal to the respective element.

col_list = ['A', 'B', 'C', 'D']
idx = (df1.loc[:,  col_list] == df2.loc[:,  col_list]).all(axis=1)

df1['new_row'] = idx.astype(int)



回答2:


I think you need merge with left join and parameter indicator=True, then compare column _merge with eq (same as ==) and last convert boolean True and False to 1 and 0 by astype:

cols = list('ABCD')
df1['Target'] = pd.merge(df1[cols], 
                      df2[cols], how='left', indicator=True)['_merge'].eq('both').astype(int)
print (df1)

   A  B   C   D  E  F  Target
0  1  1  20  20  3  2       1
1  1  1  12  14  1  3       0
2  2  1  13  43  4  3       0
3  2  2  12  34  1  4       1

Detail:

print (pd.merge(df1[cols], df2[cols], how='left', indicator=True))
   A  B   C   D     _merge
0  1  1  20  20       both
1  1  1  12  14  left_only
2  2  1  13  43  left_only
3  2  2  12  34       both



回答3:


You can use logical operators for that. You can have a look at Logic operator for boolean indexing in Pandas or Element-wise logical OR in Pandas for some ideas.

But your specification does not suffice for providing a solution sketch because I do not know how the rows in df1 should work with df2. Is it that the number of rows are the same and each row in df1 should have the column with the boolean value for that in df2 in the same row A, B, C, and D are the same?



来源:https://stackoverflow.com/questions/46642053/finding-rows-with-same-column-values-in-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!