How to match any string from a list of strings in regular expressions in python?

后端 未结 5 1296
深忆病人
深忆病人 2020-12-01 10:50

Lets say I have a list of strings,

string_lst = [\'fun\', \'dum\', \'sun\', \'gum\']

I want to make a regular expression, where at a point

相关标签:
5条回答
  • 2020-12-01 11:14

    In line with @vks reply - I feel this actually does the comeplete task..

    finds = re.findall(r"(?=(\b" + '\\b|\\b'.join(string_lst) + r"\b))", x)
    

    Adding word boundary completes the task!

    0 讨论(0)
  • 2020-12-01 11:16
    string_lst = ['fun', 'dum', 'sun', 'gum']
    x="I love to have fun."
    
    print re.findall(r"(?=("+'|'.join(string_lst)+r"))",x)
    

    You cannot use match as it will match from start.Use findall instead.

    Output:['fun']

    using search you will get only the first match.So use findall instead.

    Also use lookahead if you have overlapping matches not starting at the same point.

    0 讨论(0)
  • 2020-12-01 11:27

    Except for the regular expression, you can use list comprehension, hope it's not off the topic.

    import re
    def match(input_string, string_list):
        words = re.findall(r'\w+', input_string)
        return [word for word in words if word in string_list]
    
    >>> string_lst = ['fun', 'dum', 'sun', 'gum']
    >>> match("I love to have fun.", string_lst)
    ['fun']
    
    0 讨论(0)
  • 2020-12-01 11:32

    You should make sure to escape the strings correctly before combining into a regex

    >>> import re
    >>> string_lst = ['fun', 'dum', 'sun', 'gum']
    >>> x = "I love to have fun."
    >>> regex = re.compile("(?=(" + "|".join(map(re.escape, string_lst)) + "))")
    >>> re.findall(regex, x)
    ['fun']
    
    0 讨论(0)
  • 2020-12-01 11:37

    regex module has named lists (sets actually):

    #!/usr/bin/env python
    import regex as re # $ pip install regex
    
    p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
    if p.search("I love to have fun."):
        print('matched')
    

    Here words is just a name, you can use anything you like instead.
    .search() methods is used instead of .* before/after the named list.

    To emulate named lists using stdlib's re module:

    #!/usr/bin/env python
    import re
    
    words = ['fun', 'dum', 'sun', 'gum']
    longest_first = sorted(words, key=len, reverse=True)
    p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
    if p.search("I love to have fun."):
        print('matched')
    

    re.escape() is used to escape regex meta-characters such as .*? inside individual words (to match the words literally).
    sorted() emulates regex behavior and it puts the longest words first among the alternatives, compare:

    >>> import re
    >>> re.findall("(funny|fun)", "it is funny")
    ['funny']
    >>> re.findall("(fun|funny)", "it is funny")
    ['fun']
    >>> import regex
    >>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
    ['funny']
    >>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
    ['funny']
    
    0 讨论(0)
提交回复
热议问题