Drop all duplicate rows across multiple columns in Python Pandas

前端 未结 6 2061
北海茫月
北海茫月 2020-11-21 21:00

The pandas drop_duplicates function is great for \"uniquifying\" a dataframe. However, one of the keyword arguments to pass is take_last=True

6条回答
  •  一整个雨季
    2020-11-21 21:33

    Just want to add to Ben's answer on drop_duplicates:

    keep : {‘first’, ‘last’, False}, default ‘first’

    • first : Drop duplicates except for the first occurrence.

    • last : Drop duplicates except for the last occurrence.

    • False : Drop all duplicates.

    So setting keep to False will give you desired answer.

    DataFrame.drop_duplicates(*args, **kwargs) Return DataFrame with duplicate rows removed, optionally only considering certain columns

    Parameters: subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns keep : {‘first’, ‘last’, False}, default ‘first’ first : Drop duplicates except for the first occurrence. last : Drop duplicates except for the last occurrence. False : Drop all duplicates. take_last : deprecated inplace : boolean, default False Whether to drop duplicates in place or to return a copy cols : kwargs only argument of subset [deprecated] Returns: deduplicated : DataFrame

提交回复
热议问题