Python regex words boundary with unexpected results

后端 未结 1 1369
感动是毒
感动是毒 2021-01-25 07:35
import re
sstring = \"ON Any ON Any\"
regex1 = re.compile(r\'\'\' \\bON\\bANY\\b\'\'\', re.VERBOSE)
regex2 = re.compile(r\'\'\'\\b(ON)?\\b(Any)?\'\'\', re.VERBOSE)
regex         


        
相关标签:
1条回答
  • 2021-01-25 08:23

    Note that to match ON ANY you need to add an escaped (since you are using re.VERBOSE flag) space between ON and ANY as \b word boundary being a zero-width assertion does not consume any text, just asserts a position between specific characters. That is the reason for your first re.compile(r''' \bON\bANY\b''', re.VERBOSE) approach failure.

    Use

    rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE)
    

    See the Python demo

    The re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE) returns tuples since you defined (...) capturing groups in the pattern.

    The re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE) matches optional sequences, either ON or Any, so you get those words as values. You get empty values as well because this regex can match just a word boundary (all other subpatterns are optional).

    More details about word boundaries:

    • Word boundaries at Regular-Expressions.info
    • Java Regex Word Boundaries (this is still a word boundary in a regex, also applicable here)
    0 讨论(0)
提交回复
热议问题