Anonymizing data / replacing names

后端 未结 3 734
一整个雨季
一整个雨季 2021-01-24 06:13

Normally I anonymize my data by using hashlib and using the .apply(hash) function.

Now im trying a new approach, imagine I have to following df called \'data\':

3条回答
  •  长情又很酷
    2021-01-24 06:47

    I think faster solution is use factorize for unique values, add 1, convert to Series and strings and prepend Person string:

    df['contributor'] = 'Person' + pd.Series(pd.factorize(df['contributor'])[0] + 1).astype(str)
    print (df)
      contributor  amount payed
    0     Person1            10
    1     Person2            28
    2     Person3            49
    3     Person2            77
    4     Person4            31
    

提交回复
热议问题