Python 3.4 Remove duplicates and their corresponding values

六眼飞鱼酱① 提交于 2019-12-12 06:13:41

问题


I'd like to remove duplicates as well as their corresponding, original values from a dataframe.

Sframe is the name of my dataframe. The fields on which I want to check for duplicates are 'TermName', 'SchoolName', and 'StudentID'.

Here's an example of what I'm starting with:

TermName SchoolName StudentID
14-15   a   1
14-15   a   1
14-15   a   1
14-15   b   2
14-15   b   2
14-15   b   3
14-15   c   4
14-15   c   5
14-15   d   6
14-15   e   7
14-15   f   8

Here's what I'm looking for:

TermName SchoolName StudentID
    14-15   a   1
    14-15   a   1
    14-15   a   1
    14-15   b   2
    14-15   b   2

@Jubbles showed me how to identify and keep only the duplicate and duplicated rows (i.e. the last 6 rows in my first table example above) with this:

#unique that are duplicated only column
Sframe['dup_check_1'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = False)
#duplicates only column
Sframe['dup_check_2'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = True)
#remove both from Sframe (df)
Sframe = Sframe[(Sframe['dup_check_1'] == False) & (Sframe['dup_check_2'] == False)]
#delete duplicate checking columns
del Sframe['dup_check_1'], Sframe['dup_check_2']

I tried to get the rows excluded by the above code (and not the rows included above) by changing False to True here:

Sframe = Sframe[(Sframe['dup_check_1'] == True) & (Sframe['dup_check_2'] == True)]

...but it didn't work. It seems like a simple change in code from False to True, but it does not return the correct number of rows (only 6 instead of 354).

Any ideas?

来源:https://stackoverflow.com/questions/27735618/python-3-4-remove-duplicates-and-their-corresponding-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!