Add numbers with duplicate values for columns in pandas

后端 未结 1 1455
渐次进展
渐次进展 2020-12-19 16:18

I have a data frame like this:

df:
col1     col2
 1        pqr
 3        abc
 2        pqr
 4        xyz
 1        pqr

I found that there i

1条回答
  •  有刺的猬
    2020-12-19 16:46

    Use duplicated with keep=False for all dupe rows and add counter created by cumcount:

    mask = df['col2'].duplicated(keep=False)
    df.loc[mask, 'col2'] += df.groupby('col2').cumcount().add(1).astype(str)
    

    Or:

    df['col2'] = np.where(df['col2'].duplicated(keep=False), 
                          df['col2'] + df.groupby('col2').cumcount().add(1).astype(str),
                          df['col2'])
    print (df)
       col1  col2
    0     1  pqr1
    1     3   abc
    2     2  pqr2
    3     4   xyz
    4     1  pqr3
    

    If need same only for pqr values:

    mask = df['col2'] == 'pqr'
    df.loc[mask, 'col2'] += pd.Series(np.arange(1, mask.sum() + 1),
                                      index=df.index[mask]).astype(str)
    print (df)
       col1  col2
    0     1  pqr1
    1     3   abc
    2     2  pqr2
    3     4   xyz
    4     1  pqr3
    

    0 讨论(0)
提交回复
热议问题