Im still learning the ropes with Python ad regular expressions and I need some help please! I am in need of a regular expression that can search a sentence for specific word
Use the union operator |
to search for all the words you need to find:
In [20]: re_pattern = r'\b(?:total|staff)\b'
In [21]: re.findall(re_pattern, question)
Out[21]: ['total', 'staff']
This matches your example above most closely. However, this approach only works if there are no other characters which have been prepended or appended to a word. This is often the case at the end of main and subordinate clauses in which a comma, a dot, an exclamation mark or a question mark are appended to the last word of the clause.
For example, in the question How many people are in your staff? the approach above wouldn't find the word staff because there is no word boundary at the end of staff. Instead, there is a question mark. But if you leave out the second \b
at the end of the regular expression above, the expression would wrongly detect words in substrings, such as total in totally or totalities.
The best way to accomplish what you want is to extract all alphanumeric characters in your sentence first and then search this list for the words you need to find:
In [51]: def find_all_words(words, sentence):
....: all_words = re.findall(r'\w+', sentence)
....: words_found = []
....: for word in words:
....: if word in all_words:
....: words_found.append(word)
....: return words_found
In [52]: print find_all_words(['total', 'staff'], 'The total number of staff in 30?')
['total', 'staff']
In [53]: print find_all_words(['total', 'staff'], 'My staff is totally overworked.')
['staff']
question = "the total number of staff in 30?"
find=["total","staff"]
words=re.findall("\w+",question)
result=[x for x in find if x in words]
result
['total', 'staff']
Have you though to use something beyond Regex?
Consider this and and if it works expand from this solution
>>> 'total' in question.split()
True
Similarly
>>> words = {'total','staff'}
>>> [e for e in words if e in question.split()]
['total', 'staff']