The pandas
drop_duplicates
function is great for \"uniquifying\" a dataframe. However, one of the keyword arguments to pass is take_last=True
Just want to add to Ben's answer on drop_duplicates:
keep
: {‘first’, ‘last’, False}, default ‘first’
first : Drop duplicates except for the first occurrence.
last : Drop duplicates except for the last occurrence.
False : Drop all duplicates.
So setting keep
to False will give you desired answer.
DataFrame.drop_duplicates(*args, **kwargs) Return DataFrame with duplicate rows removed, optionally only considering certain columns
Parameters: subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns keep : {‘first’, ‘last’, False}, default ‘first’ first : Drop duplicates except for the first occurrence. last : Drop duplicates except for the last occurrence. False : Drop all duplicates. take_last : deprecated inplace : boolean, default False Whether to drop duplicates in place or to return a copy cols : kwargs only argument of subset [deprecated] Returns: deduplicated : DataFrame