Identify strings while removing substrings in python

后端 未结 3 1002
北恋
北恋 2021-01-21 21:29

I have a dictionary of words with their frequencies as follows.

mydictionary = {\'yummy tim tam\':3, \'milk\':2, \'chocolates\':5, \'biscuit pudding\':3, \'sugar         


        
相关标签:
3条回答
  • 2021-01-21 21:39

    You can update your code with regex word boundary:

    mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
    recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
    searcher = re.compile(r'{}'.format("|".join(map(lambda x: r'\b{}\b'.format(x), mydictionary.keys()))), flags=re.I | re.S)
    
    for match in searcher.findall(recipes_book):
        print(match)
    

    Output:

    biscuit pudding
    yummy tim tam
    milk
    
    0 讨论(0)
  • 2021-01-21 21:51

    One more way using re.escape. More info regarding re.escape here !!!

    import re
    
    mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
    recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
    
    val_list = []
    
    for i in mydictionary.keys():
        tmp_list = []
        regex_tmp = r'\b'+re.escape(str(i))+r'\b'
        tmp_list = re.findall(regex_tmp,recipes_book)
        val_list.extend(tmp_list)
    
    print val_list
    

    Output:

    "C:\Program Files (x86)\Python27\python.exe" C:/Users/punddin/PycharmProjects/demo/demo.py
    ['yummy tim tam', 'biscuit pudding', 'milk']
    
    0 讨论(0)
  • 2021-01-21 22:00

    Use word boundary '\b'. In simple words

    recipes_book = "For todays lesson we will show you how to make biscuit pudding using 
    yummy tim tam milk and rawsugar"
    
    >>> re.findall(r'(?is)(\bchocolates\b|\bbiscuit pudding\b|\bsugar\b|\byummy tim tam\b|\bmilk\b)',recipes_book)
    ['biscuit pudding', 'yummy tim tam', 'milk']
    
    0 讨论(0)
提交回复
热议问题