How get all matches using str.contains in python regex?

前端 未结 2 1325
闹比i
闹比i 2021-01-27 03:37

I have a data frame, in which I need to find all the possible matches rows which match with terms. My code is

texts = [\'foo abc\', \'foobar xyz\',         


        
2条回答
  •  走了就别回头了
    2021-01-27 03:50

    The longer alternatives should come before the shorter ones, thus, you need to sort the keywords by length in the descending order:

    pat = r'\b(?:{})\b'.format('|'.join(sorted(terms,key=len,reverse=True)))
    

    The result will be \b(?:foo baz|foo|baz)\b pattern. It will first try to match foo baz, then foo, then baz. If foo baz is found, the match is returned, then the next match is searched for from the end of the match, so you won't match foo or baz found with the previous match again.

    See more on this in "Remember That The Regex Engine Is Eager".

提交回复
热议问题