python pandas use map with regular expressions

走远了吗. 提交于 2021-02-19 06:36:05

问题


I have a dict:

dealer = {
    'ESSELUNGA': 'Spesa',
    'DECATHLON 00000120': 'Sport',
    'LEROY MERLIN': 'Casa',
    'CONAD 8429': 'Spesa',
    'IKEA': 'Casa',
    'F.LLI MADAFFARI': 'Spesa',
    'SUPERMERCATO IL GIGANT': 'Spesa',
    'NATURASI SPA': 'Spesa',
    'ESSELUNGA SETTIMO MILANE': 'Spesa'
}

and I want to map it to a pandas df:

entries.Categoria = entries.Commerciante.map(dealer)

Is there a way to use regex to match map on "Commerciante" column? In this way I can rewrite dealer as this:

dealer = {
    'ESSELUNGA': 'Spesa',
    'DECATHLON': 'Sport',
    'LEROY MERLIN': 'Casa',
    'CONAD': 'Spesa',
    'IKEA': 'Casa',
    'F.LLI MADAFFARI': 'Spesa',
    'SUPERMERCATO IL GIGANT': 'Spesa',
    'NATURASI SPA': 'Spesa',
    'ESSELUNGA SETTIMO MILANE': 'Spesa'
}

and match both "DECATHLON" and "DECATHLON 00000120"


回答1:


One can use a dict comprehension with a regular expression to rewrite key. The re python module is used to perform this task, with the command sub. The substitution key looks like:

import re
dealer = {re.sub(r'(\W)[0-9]+',r'\1',k).strip():dealer[k] for k in dealer}

The whole example gives:

import re
dealer = {
    'ESSELUNGA': 'Spesa',
    'DECATHLON 00000120': 'Sport',
    'LEROY MERLIN': 'Casa',
    'CONAD 8429': 'Spesa',
    'IKEA': 'Casa',
    'F.LLI MADAFFARI': 'Spesa',
    'SUPERMERCATO IL GIGANT': 'Spesa',
    'NATURASI SPA': 'Spesa',
    'ESSELUNGA SETTIMO MILANE': 'Spesa'
}
dealer = {re.sub(r'(\W)[0-9]+',r'\1',k).strip():dealer[k] for k in dealer}



回答2:


Thank you to all of you. I used your suggestions to resolve my problem. I defined a new function:

def dealer_replace(dealer_dict, text):

    regex = re.compile("(%s)" % "|".join(map(re.escape, dealer_dict.keys())))

    if regex.search(text):
        ret = regex.search(text)
        return dealer_dict[ret.group()]
    else:
        return None

And use it with apply

entries['Categoria'] = entries['Commerciante'].apply(lambda v: dealer_replace(dealer, str(v)))



回答3:


Why don't you use apply and on a modified dictionary lookup:

In [14]: [dname for dname in dealer if 'DECATHLON' in dname]
Out[14]: ['DECATHLON 00000120']

And, apply it like this -

df['Commerciante'] = df['Commerciante'].apply(lambda v: [dname for dname in dealer if dname.startswith('DECATHLON')][0])



回答4:


I think your problem is that you are trying to do two things in one step.

First clean your data, then map it.

pandas Series holds lots of nice string functions which can come in handy for cleaning your data. Here is a good reference to the string methods.

Once you have used the string methods for cleaning your data, mapping it will be easy as pie.



来源:https://stackoverflow.com/questions/30183326/python-pandas-use-map-with-regular-expressions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!