Add numbers with duplicate values for columns in pandas

后端未结

关注

 1  1455

渐次进展 2020-12-19 16:18

I have a data frame like this:

df:
col1     col2
 1        pqr
 3        abc
 2        pqr
 4        xyz
 1        pqr

I found that there i

1条回答

有刺的猬 (楼主)

2020-12-19 16:46

Use duplicated with keep=False for all dupe rows and add counter created by cumcount:

mask = df['col2'].duplicated(keep=False)
df.loc[mask, 'col2'] += df.groupby('col2').cumcount().add(1).astype(str)

Or:

df['col2'] = np.where(df['col2'].duplicated(keep=False), 
                      df['col2'] + df.groupby('col2').cumcount().add(1).astype(str),
                      df['col2'])
print (df)
   col1  col2
0     1  pqr1
1     3   abc
2     2  pqr2
3     4   xyz
4     1  pqr3

If need same only for pqr values:

mask = df['col2'] == 'pqr'
df.loc[mask, 'col2'] += pd.Series(np.arange(1, mask.sum() + 1),
                                  index=df.index[mask]).astype(str)
print (df)
   col1  col2
0     1  pqr1
1     3   abc
2     2  pqr2
3     4   xyz
4     1  pqr3

0 讨论(0)