问题
I have a dict:
dealer = {
'ESSELUNGA': 'Spesa',
'DECATHLON 00000120': 'Sport',
'LEROY MERLIN': 'Casa',
'CONAD 8429': 'Spesa',
'IKEA': 'Casa',
'F.LLI MADAFFARI': 'Spesa',
'SUPERMERCATO IL GIGANT': 'Spesa',
'NATURASI SPA': 'Spesa',
'ESSELUNGA SETTIMO MILANE': 'Spesa'
}
and I want to map it to a pandas df:
entries.Categoria = entries.Commerciante.map(dealer)
Is there a way to use regex to match map on "Commerciante" column? In this way I can rewrite dealer as this:
dealer = {
'ESSELUNGA': 'Spesa',
'DECATHLON': 'Sport',
'LEROY MERLIN': 'Casa',
'CONAD': 'Spesa',
'IKEA': 'Casa',
'F.LLI MADAFFARI': 'Spesa',
'SUPERMERCATO IL GIGANT': 'Spesa',
'NATURASI SPA': 'Spesa',
'ESSELUNGA SETTIMO MILANE': 'Spesa'
}
and match both "DECATHLON" and "DECATHLON 00000120"
回答1:
One can use a dict comprehension with a regular expression to rewrite key.
The re python module is used to perform this task, with the command sub
. The substitution key looks like:
import re
dealer = {re.sub(r'(\W)[0-9]+',r'\1',k).strip():dealer[k] for k in dealer}
The whole example gives:
import re
dealer = {
'ESSELUNGA': 'Spesa',
'DECATHLON 00000120': 'Sport',
'LEROY MERLIN': 'Casa',
'CONAD 8429': 'Spesa',
'IKEA': 'Casa',
'F.LLI MADAFFARI': 'Spesa',
'SUPERMERCATO IL GIGANT': 'Spesa',
'NATURASI SPA': 'Spesa',
'ESSELUNGA SETTIMO MILANE': 'Spesa'
}
dealer = {re.sub(r'(\W)[0-9]+',r'\1',k).strip():dealer[k] for k in dealer}
回答2:
Thank you to all of you. I used your suggestions to resolve my problem. I defined a new function:
def dealer_replace(dealer_dict, text):
regex = re.compile("(%s)" % "|".join(map(re.escape, dealer_dict.keys())))
if regex.search(text):
ret = regex.search(text)
return dealer_dict[ret.group()]
else:
return None
And use it with apply
entries['Categoria'] = entries['Commerciante'].apply(lambda v: dealer_replace(dealer, str(v)))
回答3:
Why don't you use apply
and on a modified dictionary lookup:
In [14]: [dname for dname in dealer if 'DECATHLON' in dname]
Out[14]: ['DECATHLON 00000120']
And, apply it like this -
df['Commerciante'] = df['Commerciante'].apply(lambda v: [dname for dname in dealer if dname.startswith('DECATHLON')][0])
回答4:
I think your problem is that you are trying to do two things in one step.
First clean your data, then map it.
pandas Series holds lots of nice string functions which can come in handy for cleaning your data. Here is a good reference to the string methods.
Once you have used the string methods for cleaning your data, mapping it will be easy as pie.
来源:https://stackoverflow.com/questions/30183326/python-pandas-use-map-with-regular-expressions