问题
I am trying to remove duplicates values of specific columns based on a single column, while keeping the rest of the row.
df = pd.DataFrame({'A':[1,2,3,4],'B':[5,5,6,7],'C':['a','a','b',c'], D:['c','d','e','f']})
I want to delete the values in column A & B based off the duplicates in column C, but keeping all of column D.
Expected output:
A B C D
1 5 a c
d
3 6 b e
4 7 c f
回答1:
Using simple loc
df.loc[df.C.duplicated(), ['A', 'B']] = ''
A B C D
0 1 5 a c
1 a d
2 3 6 b e
3 4 7 c f
Can also use np.nan
instead of empty string not to mess with the dtypes
来源:https://stackoverflow.com/questions/52338468/python-pandas-remove-duplicate-cells-keep-the-rows