I am a newbie to pandas, tried searching this on google but still no luck. How can I get the rows by distinct values in column2?
For example, I have the dataframe be
Use drop_duplicates with specifying column COL2
for check duplicates:
df = df.drop_duplicates('COL2')
#same as
#df = df.drop_duplicates('COL2', keep='first')
print (df)
COL1 COL2
0 a.com 22
1 b.com 45
2 c.com 34
4 f.com 56
You can also keep only last values:
df = df.drop_duplicates('COL2', keep='last')
print (df)
COL1 COL2
2 c.com 34
4 f.com 56
5 g.com 22
6 h.com 45
Or remove all duplicates:
df = df.drop_duplicates('COL2', keep=False)
print (df)
COL1 COL2
2 c.com 34
4 f.com 56