How to match any string from a list of strings in regular expressions in python?

后端未结

关注

 5  1296

Lets say I have a list of strings,

string_lst = [\'fun\', \'dum\', \'sun\', \'gum\']

I want to make a regular expression, where at a point

相关标签:

5条回答

无人及你

2020-12-01 11:14
In line with @vks reply - I feel this actually does the comeplete task..
```
finds = re.findall(r"(?=(\b" + '\\b|\\b'.join(string_lst) + r"\b))", x)
```
Adding word boundary completes the task!
0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2020-12-01 11:16
```
string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."

print re.findall(r"(?=("+'|'.join(string_lst)+r"))",x)
```
You cannot use match as it will match from start.Use findall instead.

Output:['fun']

using search you will get only the first match.So use findall instead.

Also use lookahead if you have overlapping matches not starting at the same point.
0 讨论(0)
发布评论:

提交评论
- 加载中...

有刺的猬

2020-12-01 11:27

Except for the regular expression, you can use list comprehension, hope it's not off the topic.

import re
def match(input_string, string_list):
    words = re.findall(r'\w+', input_string)
    return [word for word in words if word in string_list]

>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> match("I love to have fun.", string_lst)
['fun']

0 讨论(0)

忘掉有多难

2020-12-01 11:32

You should make sure to escape the strings correctly before combining into a regex

>>> import re
>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> x = "I love to have fun."
>>> regex = re.compile("(?=(" + "|".join(map(re.escape, string_lst)) + "))")
>>> re.findall(regex, x)
['fun']

0 讨论(0)

攒了一身酷

2020-12-01 11:37

regex module has named lists (sets actually):

#!/usr/bin/env python
import regex as re # $ pip install regex

p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
    print('matched')

Here words is just a name, you can use anything you like instead.
.search() methods is used instead of .* before/after the named list.

To emulate named lists using stdlib's re module:

#!/usr/bin/env python
import re

words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
    print('matched')

re.escape() is used to escape regex meta-characters such as .*? inside individual words (to match the words literally).
sorted() emulates regex behavior and it puts the longest words first among the alternatives, compare:

>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
['funny']

0 讨论(0)