Normally I anonymize my data by using hashlib and using the .apply(hash) function.
Now im trying a new approach, imagine I have to following df called \'data\':
I think faster solution is use factorize for unique values, add 1
, convert to Series
and string
s and prepend Person
string:
df['contributor'] = 'Person' + pd.Series(pd.factorize(df['contributor'])[0] + 1).astype(str)
print (df)
contributor amount payed
0 Person1 10
1 Person2 28
2 Person3 49
3 Person2 77
4 Person4 31