How to match alternatives with python regex

前端 未结 1 1188
没有蜡笔的小新
没有蜡笔的小新 2021-01-21 13:41

given string 1:

\'\'\'TOM likes to go swimming MARY loves to go to the playground JANE likes going shopping\'\'\'

I want to capture the text between only 2 name

1条回答
  •  悲哀的现实
    2021-01-21 14:10

    You need to fix your alternation, it must be enclosed with a non-capturing group (?:JANE|MARY), and use a lazy quantifier with [\w\W] (that I would replace with .* and use re.DOTALL modifier to make the dot to also match line breaks):

    (?s)TOM\s*(.+?)\s*(?:JANE|MARY)
    

    See the regex demo

    Without the (?:...|...), your regex matched Tom, then any 1+ chars as many as possible (that is, the regex grabbed the whole string, and then backtracked to match the last occurrence of the subsequent subpattern, JANE) and JANE, or MARY substring. Now, the fixed regex matches:

    • (?s) - DOTALL inline modifier
    • TOM - a literal char sequence
    • \s* - 0+ whitespaces
    • (.+?) - Group 1 (capturing): any 1+ chars, as few as possible, up to the first occurrence of the subsequent subpatterns....
    • \s* - 0+ whitespaces
    • (?:JANE|MARY) - either JANE or MARY substring.

    0 讨论(0)
提交回复
热议问题