Python Pandas Remove Duplicate Cells - Keep the rows

时间秒杀一切 提交于 2021-01-27 13:40:26

问题


I am trying to remove duplicates values of specific columns based on a single column, while keeping the rest of the row.

df = pd.DataFrame({'A':[1,2,3,4],'B':[5,5,6,7],'C':['a','a','b',c'], D:['c','d','e','f']})

I want to delete the values in column A & B based off the duplicates in column C, but keeping all of column D.

Expected output:

A B C D
1 5 a c
      d
3 6 b e
4 7 c f

回答1:


Using simple loc

df.loc[df.C.duplicated(), ['A', 'B']] = ''

    A   B   C   D
0   1   5   a   c
1           a   d
2   3   6   b   e
3   4   7   c   f

Can also use np.nan instead of empty string not to mess with the dtypes



来源:https://stackoverflow.com/questions/52338468/python-pandas-remove-duplicate-cells-keep-the-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!