问题
I have two dataframes with different column size, where four columns can have the same values in both dataframes. I want to make a new column in df1, that takes the value 1 if there is a row in df2 that has the same values for column 'A','B','C', and 'D' as a row in df1. If there isn't such a row, I want the value to be 0. Rows 'E' and 'F' are not important for checking the values.
Is there a pandas function that can do this, or do I have to this in a loop.
For example:
df1 =
A B C D E F
1 1 20 20 3 2
1 1 12 14 1 3
2 1 13 43 4 3
2 2 12 34 1 4
df2 =
A B C D E
1 3 12 14 2
1 1 20 20 4
2 2 21 31 5
2 2 12 34 8
expected output:
df1 =
A B C D E F Target
1 1 20 20 3 2 1
1 1 12 14 1 3 0
2 1 13 43 4 3 0
2 2 12 34 1 4 1
回答1:
This is fairly simple. If you check whether two DataFrames are equal, it checks if each element is equal to the respective element.
col_list = ['A', 'B', 'C', 'D']
idx = (df1.loc[:, col_list] == df2.loc[:, col_list]).all(axis=1)
df1['new_row'] = idx.astype(int)
回答2:
I think you need merge with left join
and parameter indicator=True
, then compare column _merge
with eq (same as ==
) and last convert boolean True
and False
to 1
and 0
by astype:
cols = list('ABCD')
df1['Target'] = pd.merge(df1[cols],
df2[cols], how='left', indicator=True)['_merge'].eq('both').astype(int)
print (df1)
A B C D E F Target
0 1 1 20 20 3 2 1
1 1 1 12 14 1 3 0
2 2 1 13 43 4 3 0
3 2 2 12 34 1 4 1
Detail:
print (pd.merge(df1[cols], df2[cols], how='left', indicator=True))
A B C D _merge
0 1 1 20 20 both
1 1 1 12 14 left_only
2 2 1 13 43 left_only
3 2 2 12 34 both
回答3:
You can use logical operators for that. You can have a look at Logic operator for boolean indexing in Pandas or Element-wise logical OR in Pandas for some ideas.
But your specification does not suffice for providing a solution sketch because I do not know how the rows in df1 should work with df2. Is it that the number of rows are the same and each row in df1 should have the column with the boolean value for that in df2 in the same row A, B, C, and D are the same?
来源:https://stackoverflow.com/questions/46642053/finding-rows-with-same-column-values-in-pandas-dataframe