Pandas merge two DF with rows replacement

半腔热情 提交于 2021-02-17 05:18:25

问题


I faced with an issue to merge two DF into one and save all duplicate rows by id value from the second DF. Example:

df1 = pd.DataFrame({
    'id': ['id1', 'id2', 'id3', 'id4'],
    'com': [134.6, 223, 0, 123],
    'malicious': [False, False, True, False]
})

df2 = pd.DataFrame({
    'id': ['id7', 'id2', 'id5', 'id6'],
    'com': [134.6, 27.6, 0, 123],
    'malicious': [False, False, False, False]
})

df1
    id    com  malicious
0  id1  134.6      False
1  id2  223.0      False
2  id3    0.0       True
3  id4  123.0      False

df2
    id    com  malicious        date
0  id7  134.6      False  2021-01-01
1  id2   27.6      False  2021-01-01
2  id5    0.0      False  2021-01-01
3  id6  123.0      False  2021-01-01

I'm. expecting the output to be:

    id    com  malicious        date
1  id1  134.6      False  null
2  id3    0.0       True  null
3  id4  123.0      False  null
4  id7  134.6      False  2021-01-01
5  id2   27.6      False  2021-01-01
6  id5    0.0      False  2021-01-01
7  id6  123.0      False  2021-01-01

As you can see we added a new column and all rows of df1 a null there now and row with id2 is replaced with all values from df2(amount of updated columns can be different, so it's not about updating specific columns values, but about replacing the whole row by id) I dont care about indexes and sort

Looking for efficient solution as I have huge amount of files, which I should merge this way into main DF


回答1:


If need unique id with remove values from df1 if exist also in df2 use:

df = pd.concat([df1, df2]).drop_duplicates('id', keep='last')


来源:https://stackoverflow.com/questions/65806053/pandas-merge-two-df-with-rows-replacement

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!