问题
I am trying to know a position of a string (word) in a sentence. I am using the function below. This function is working perfectly for most of the words but for this string GLC-SX-MM=
in the sentence I have a lot of GLC-SX-MM= in my inventory list
there is no way to get the match. I tryied scaping - and = but not works. Any idea? I cannot split the sentence using spaces because sometimes I have composed words separated by space.
import re
def get_start_end(self, sentence, key):
r = re.compile(r'\b(%s)\b' % key, re.I)
m = r.search(question)
start = m.start()
end = m.end()
return start, end
回答1:
You need to escape the key when looking for a literal string, and make sure to use unambiguous (?<!\w)
and (?!\w)
boundaries:
import re
def get_start_end(self, sentence, key):
r = re.compile(r'(?<!\w){}(?!\w)'.format(re.escape(key)), re.I)
m = r.search(question)
start = m.start()
end = m.end()
return start, end
The r'(?<!\w){}(?!\w)'.format(re.escape(key))
will build a regex like (?<!\w)abc\.def\=(?!\w)
out of abc.def=
keyword, and (?<!\w)
will fail any match if there is a word char immediately to the left of the keyword and (?!\w)
will fail any match if there is a word char immediately to the right of the keyword.
回答2:
This is not actual answer but help to solve the problem.
You can get pattern dynamically to debug.
import re
def get_start_end(sentence, key):
r = re.compile(r'\b(%s)\b' % key, re.I)
print(r.pattern)
sentence = "foo-bar is not foo=bar"
get_start_end(sentence, 'o-')
get_start_end(sentence, 'o=')
\b(o-)\b
\b(o=)\b
You can then try matching the pattern manually like using https://regex101.com/ if it matches.
来源:https://stackoverflow.com/questions/49362719/searching-for-a-whole-word-that-contains-leading-or-trailing-special-characters