Python Pandas Drop Duplicates keep second to last

前端 未结 2 419
夕颜
夕颜 2021-02-07 23:35

What\'s the most efficient way to select the second to last of each duplicated set in a pandas dataframe?

For instance I basically want to do this operation:

<         


        
2条回答
  •  南方客
    南方客 (楼主)
    2021-02-08 00:27

    With groupby.apply:

    df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4], 
                       'B': np.arange(10), 'C': np.arange(10)})
    
    df
    Out: 
       A  B  C
    0  1  0  0
    1  1  1  1
    2  1  2  2
    3  1  3  3
    4  2  4  4
    5  2  5  5
    6  2  6  6
    7  3  7  7
    8  3  8  8
    9  4  9  9
    
    (df.groupby('A', as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[-2]])
       .reset_index(level=0, drop=True))
    Out: 
       A  B  C
    2  1  2  2
    5  2  5  5
    7  3  7  7
    9  4  9  9
    

    With a different DataFrame, subset two columns:

    df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 3, 3, 4], 
                       'B': [1, 1, 2, 1, 2, 2, 2, 3, 3, 4], 'C': np.arange(10)})
    
    df
    Out: 
       A  B  C
    0  1  1  0
    1  1  1  1
    2  1  2  2
    3  1  1  3
    4  2  2  4
    5  2  2  5
    6  2  2  6
    7  3  3  7
    8  3  3  8
    9  4  4  9
    
    (df.groupby(['A', 'B'], as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[-2]])
       .reset_index(level=0, drop=True))
    Out: 
       A  B  C
    1  1  1  1
    2  1  2  2
    5  2  2  5
    7  3  3  7
    9  4  4  9
    

提交回复
热议问题