问题
I faced with an issue to merge two DF into one and save all duplicate rows by id
value from the second DF. Example:
df1 = pd.DataFrame({
'id': ['id1', 'id2', 'id3', 'id4'],
'com': [134.6, 223, 0, 123],
'malicious': [False, False, True, False]
})
df2 = pd.DataFrame({
'id': ['id7', 'id2', 'id5', 'id6'],
'com': [134.6, 27.6, 0, 123],
'malicious': [False, False, False, False]
})
df1
id com malicious
0 id1 134.6 False
1 id2 223.0 False
2 id3 0.0 True
3 id4 123.0 False
df2
id com malicious date
0 id7 134.6 False 2021-01-01
1 id2 27.6 False 2021-01-01
2 id5 0.0 False 2021-01-01
3 id6 123.0 False 2021-01-01
I'm. expecting the output to be:
id com malicious date
1 id1 134.6 False null
2 id3 0.0 True null
3 id4 123.0 False null
4 id7 134.6 False 2021-01-01
5 id2 27.6 False 2021-01-01
6 id5 0.0 False 2021-01-01
7 id6 123.0 False 2021-01-01
As you can see we added a new column and all rows of df1 a null there now and row with id2
is replaced with all values from df2
(amount of updated columns can be different, so it's not about updating specific columns values, but about replacing the whole row by id
)
I dont care about indexes and sort
Looking for efficient solution as I have huge amount of files, which I should merge this way into main DF
回答1:
If need unique id
with remove values from df1
if exist also in df2
use:
df = pd.concat([df1, df2]).drop_duplicates('id', keep='last')
来源:https://stackoverflow.com/questions/65806053/pandas-merge-two-df-with-rows-replacement