Remove duplicates based on the content of two columns not the order

丶灬走出姿态 提交于 2021-02-08 10:32:22

问题


I have a correlation matrix that i melted into a dataframe so now i have the following for example:

First      Second       Value
A          B            0.5
B          A            0.5
A          C            0.2 

i want to delete only one of the first two rows. What would be the way to do it?


回答1:


Use:

#if want select columns by columns names
m = ~pd.DataFrame(np.sort(df[['First','Second']], axis=1)).duplicated()
#if want select columns by positons
#m = ~pd.DataFrame(np.sort(df.iloc[:,:2], axis=1)).duplicated()
print (m)

0     True
1    False
2     True
dtype: bool

df = df[m]
print (df)
  First Second  Value
0     A      B    0.5
2     A      C    0.2



回答2:


You could call drop_duplicates on the np.sorted columns:

df = df.loc[~pd.DataFrame(np.sort(df.iloc[:, :2])).duplicated()]
df

  First Second  Value
0     A      B    0.5
2     A      C    0.2

Details

np.sort(df.iloc[:, :2])

array([['A', 'B'],
       ['A', 'B'],
       ['A', 'C']], dtype=object)

~pd.DataFrame(np.sort(df.iloc[:, :2], axis=1)).duplicated()

0     True
1    False
2     True
dtype: bool

Sort the columns and figure out which ones are duplicates. The mask will then be used to filter out the dataframe via boolean indexing.

To reset the index, use reset_index:

df.reset_index(drop=1)

  First Second  Value
0     A      B    0.5
1     A      C    0.2



回答3:


One can also use following approach:

# create a new column after merging and sorting 'First' and 'Second':
df['newcol']=df.apply(lambda x: "".join(sorted(x[0]+x[1])), axis=1)
print(df)

  First Second  Value newcol
0     A      B    0.5     AB
1     B      A    0.5     AB
2     A      C    0.2     AC

# get its non-duplicated indexes and remove the new column: 
df = df[~df.newcol.duplicated()].iloc[:,:3]
print(df)

  First Second  Value
0     A      B    0.5
2     A      C    0.2


来源:https://stackoverflow.com/questions/47051854/remove-duplicates-based-on-the-content-of-two-columns-not-the-order

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!